Avoidability of Palindrome Patterns

We characterize the formulas that are avoided by every α -free word for some α > 1. We show that the avoidable formulas whose fragments are of the form XY or XY X are 4-avoidable. The largest avoidability index of an avoidable palindrome pattern is known to be at least 4 and at most 16. We make progress toward the conjecture that every avoidable palindrome pattern is 4-avoidable

word avoiding f also avoids p, so λ(p) λ(f ). Recall that an infinite word is recurrent if every finite factor appears infinitely many times and that any infinite factorial language contains a recurrent word [8,Proposition 5.1.13]. If there exists an infinite word over Σ avoiding p, then there exists an infinite recurrent word over Σ avoiding p. This recurrent word also avoids f , so that λ(p) = λ(f ). Without loss of generality, a formula is such that no variable is isolated and no fragment is a factor of another fragment.
Let us define the types of formulas we consider in this paper. A pattern is doubled if it contains every variable at least twice. Thus it is a formula with only one pattern. A formula f is nice if for every variable X of f , there exists a fragment of f that contains X at least twice. Notice that a doubled pattern is a nice pattern. A formula is an xyxformula if every fragment is of the form XY X, i.e., the fragment has length 3 and the first and third variable are the same. A formula is hybrid if every fragment has length 2 or is of the form XY X. Thus, an xyx-formula is a hybrid formula.
In Section 3, we consider the avoidance of nice formulas. In Section 4, we find some formulas f such that every recurrent word avoiding f over Σ λ(f ) is equivalent to a wellknown morphic word. In Section 5, we consider the avoidance of xyx-formulas and hybrid formulas. In Section 6, we consider the avoidance of patterns that are palindromes.

Preliminaries
Given a pattern p, the Zimin operator constructs the pattern Z(p) = pXp where X is a variable that is not contained in p. For every fixed t, Z t (p) denotes the pattern obtained by applying t times the Zimin operator to p. Notice that a recurrent word avoids Z t (p) if and only if it avoids p.
We say that a formula f divides a formula f if every recurrent word avoiding f also avoids f . We denote by f f the fact that f divides f . By previous discussion, p Z t (p) and Z t (p) p for every pattern p. The basic case of divisibility is that f f if f contains an occurrence f , that is, if there exists a non-erasing morphism h such that the h-image of every fragment of f is a factor of a fragment of f . Another case of divisibility obtained by transitivity: in order to obtain f p, it is sufficient to prove f Z t (p), since Z t (p) p. We use this trick in the proof of Lemma 6 and Theorem 17. Of course, divisibility is related to avoidability: if f f , then λ(f ) λ(f ).
Let Σ k = {0, 1, . . . , k − 1} denote the k-letter alphabet. We denote by Σ n k the k n words of length n over Σ k .
The operation of splitting a formula f on a fragment φ consists in replacing φ by two fragments, namely the prefix and the suffix of length |φ|−1 of φ. A formula f is minimally avoidable if splitting any fragment of f gives an unavoidable formula. The set of every minimally avoidable formula with at most n variables is called the n-avoidance basis.
The adjacency graph AG(f ) of the formula f is the bipartite graph such that • for every variable X of f , AG(f ) contains the two vertices X L and X R , • for every (possibly equal) variables X and Y , there is an edge between X L and Y R if and only if XY is a factor of f .
We say that a set S of variables of f is free if for all X, Y ∈ S, X L and Y R are in distinct connected components of AG(f ). A formula f is said to reduce to f if it is obtained by deleting all the variables of a free set from f , discarding any empty word fragment. A formula is reducible if there is a sequence of reductions to the empty formula. Finally, a locked formula is a formula having no free set.

Theorem 1 ([3]). A formula is unavoidable if and only if it is reducible.
Let us define here the following well-known pure morphic words. To specify a morphism m : Σ s → Σ e , we use the notation m = m(0)/m(1)/ · · · /m(s − 1). Assuming a morphism m : Σ s → Σ s is such that m(0) starts with 0, the fixed point of m is the right infinite word m ω (0).
• b 2 is the fixed point of 01/10. We also consider the morphic words v 3 = M 1 (b 5 ) and w 3 = M 2 (b 5 ), where M 1 = 012/1/02/12/ε and M 2 = 02/1/0/12/ε. The languages of each of these words have been studied in the literature. Let us first recall the following characterization of b 3 , v 3 , and w 3 . We say that two infinite words are equivalent if they have the same set of factors.
• Every ternary square-free recurrent word avoiding 010 and 212 is equivalent to b 3 .
• Every ternary square-free recurrent word avoiding 010 and 020 is equivalent to v 3 .
• Every ternary square-free recurrent word avoiding 121 and 212 is equivalent to w 3 .
Interestingly, these three words can be characterized in terms of a forbidden distance between consecutive occurrences of one letter.

Theorem 3.
• Every ternary square-free recurrent word such that the distance between consecutive occurrences of 1 is not 3 is equivalent to b 3 .
• Every ternary square-free recurrent word such that the distance between consecutive occurrences of 0 is not 2 is equivalent to v 3 .
• Every ternary square-free recurrent word such that the distance between consecutive occurrences of 0 is not 4 is equivalent to w 3 . Proof.
the electronic journal of combinatorics 28(1) (2021), #P1.4 • Another characterization for b 3 is that every ternary square-free recurrent word avoiding 1021 and 1021 is equivalent to b 3 [1]. This rules out the possibility that the distance between two occurrences of 1 is 3.
• Since v 3 avoids 010 and 020, the distance between two occurrences of 0 is at least 3.
• Since w 3 avoids 121 and 212, the distance between consecutive occurrences of 0 is at most 3.
The word b 4 is also known to avoid large families of formulas. Theorem 5 will be extended to hybrid formulas, see Theorem 21 in Section 5. Let us give here a result that will be needed in various parts of the paper. Thus, if w is a recurrent word that avoids a formula dividing ABA.ACA.ABCA.ACBA.ABCBA, then w is square-free.
Recall that the repetition threshold RT (n) is the smallest real number α such that there exists an infinite a + -free word over Σ n . The proof of Dejean's conjecture established that RT (2) = 2, RT (3) = 7 5 , RT (4) = 7 4 , and RT (n) = n n−1 for every n 5. An infinite RT (n) + -free word over Σ n is called a Dejean word.

Nice formulas
All the nice formulas considered so far in the literature are also 3-avoidable. This includes doubled patterns [12], circular formulas [9], the nice formulas in the 3-avoidance basis [9], and the minimally nice ternary formulas in Table 1 [15]. Theorem 7 ([9,15]). Every nice formula with at most 3 variables is 3-avoidable.
We have a risky conjecture that would generalize both Theorem 7 and the 3-avoidability of doubled patterns.

Conjecture 8. Every nice formula is 3-avoidable.
Theorem 19 in Section 5 shows that there exist infinitely many nice formulas with index 3. It means that Conjecture 8 would be best possible and it contrasts with the case of doubled patterns, since we expect that there exist only finitely many doubled patterns with index 3 [12,13]. In this section, we make progress toward Conjecture 8 by proving that every nice formula is avoidable and we explain how to get an upper bound on the index of a given nice formula.

The avoidability exponent
Let us consider a useful tool in pattern avoidance that has been defined in [12] and already used implicitly in [11]. The avoidability exponent AE(p) of a pattern p is the largest real α such that every α-free word avoids p. We extend this definition to formulas. The corresponding notion for the avoidance of patterns in the abelian setting has also been considered [7].
We look for the smallest β such that this system has no solution. Notice that a and d play symmetric roles. Thus, we can set a = d and simplify the system.
Then β is the largest eigenvalue of the matrix 1 2 1 1 0 1 2 1 0 that corresponds to the latter system. So β = 3.060647027 . . . is the largest root of the characteristic polynomial x 3 −x 2 −5x−4. Then α = 1 + 1 β+1 = 1.246266172 . . . This matrix approach is a convenient trick to use when possible. It was used in particular for some doubled patterns such that every variable occurs exactly twice [12]. It may fail if the number of inequalities is strictly greater than the number of variables or if the formula contains a repetition uvu such that |u| 2. In any case, we can fix a rational value to β and ask a computer algebra system whether the system of inequalities is solvable. Then we can get arbitrarily good approximations of β (and thus α) by a dichotomy method.
Of course, the avoidability exponent is related to divisibility.
The avoidability exponent depends on the repetitions induced by f . We have AE(f ) = 1 for formulas such as f = AB.BA.AC.CA.BC or f = AB.BA.AC.BC.CDA.DCD that do not have enough repetitions. That is, for every ε > 0, there exists a (1 + ε)-free word that contains an occurrence of f .
Let us investigate formulas with non-trivial avoidability exponent, that is, AE(f ) > 1. To show that a nice formula has a non-trivial avoidability exponent (see Lemma 10), we first introduce a notion of minimality for nice formulas similar to the notion of minimally avoidable for general formulas. A nice formula f is minimally nice if there exists no nice formula g such that v(g) v(f ) and g ≺ f . Alternatively, splitting a minimally nice formula on any of its fragments leads to a non-nice formula. The following property of every minimally nice formula is easy to derive. If a variable V appears as a prefix of a fragment φ, then • V is also a suffix of φ (since otherwise we can split on φ and obtain a nice formula), • φ contains exactly two occurrences of V (since otherwise we can remove the prefix letter V from φ and obtain a nice formula), • V is neither a prefix nor a suffix of any fragment other than φ (since otherwise we can remove this prefix/suffix letter V from the other fragment and obtain a nice formula), • Every fragment other than φ contains at most one occurrence of V (since otherwise we can remove the prefix letter V from φ and obtain a nice formula).
Proof. First remark that if a word uvu is 1 Suppose that f contradicts the lemma. Then there exists a 1 + 1 2v(f )−3 -free word w containing an occurrence h of f . Let X be a variable of f such that |h(X)| |h(Y )| for every variable Y . Since f is nice, f contains a factor of the form XP X where P is a sequence of variables that does not contain X.
If |P | Y 2, then the number of letters of h(P ) that do not belong to an occurence of h(Y ) is at most Thus there exist two occurences of h(Y ) in h(P ) that are separated by at most This can be simplified to which is a contradiction.
The circular formulas studied in [9] show that AE(f ) can be as low as 1 + (v(f )) −1 . Moreover, our example AE(ABCDBACBD) = 1.246266172 . . . shows that lower avoidability exponents exist among nice formulas with at least 4 variables.
We will describe below a method to construct infinite words avoiding a formula. This method can be applied if and only if the formula f satisfies AE(f ) > 1. So we are interested in characterizing the formulas f such that AE(f ) > 1. By Theorems 9 and 10, if f is a formula such that there exists a nice formula g satisfying g f , then AE(f ) > 1. Now we prove that the converse also holds, which gives the following characterization.
Theorem 11. A formula f satisfies AE(f ) > 1 if and only if there exists a nice formula g such that g f .
Proof. What remains to prove is that for every formula f that is not divisible by a nice formula and for every ε > 0, there exists an infinite (1 + ε)-free word w containing an occurrence of f , such that the size of the alphabet of w only depends on f and ε.
First, we consider the equivalent pattern p obtained from f by replacing every dot by a distinct variable that does not appear in f . We will actually construct an occurrence of p. Then we construct a family f i of pseudo-formulas as follows. We start with f 0 = p. To obtain f i+1 from f i , we choose a variable that appears at most once in every fragment of f i . This variable is given the alias name V i and every occurrence of V i is replaced by a dot. We say that f i is a pseudo-formula since we do not try to normalize f i , that is, f i can contain consecutive dots and f i can contain fragments that are factors of other fragments. However, we still have a notion of fragment for a pseudo-formula. Since f is not divisible by a nice formula, this process ends with the pseudo-formula f v(p) with no variable and |p| consecutive dots. The goal of this process is to obtain the ordering V 0 , V 1 , . . ., V v(p)−1 on the variables of p.
The image of every V i is a finite factor w i of a Dejean word over an alphabet of ε −1 +2 letters, so that w i is (1 + ε)-free. The alphabets are disjoint: if i = j, then w i and w j have no common letter. Finally, we define the length of w i as follows: w v(p)−1 = 1 and |w i | = ε −1 × |p| × |w i+1 | for every i such that 0 i v(p) − 2. Let us show by contradiction that the constructed occurrence h of p is (1 + ε)-free. Consider a repetition xyx of exponent at least 1 + ε that is maximal, that is, which cannot be extended to a repetition with the same period and larger exponent. Since every w i is (1 + ε)-free and since two matching letters must come from distinct occurrences of the same variable, then x = h(x ) and y = h(y ) where x and y are factors of p. Our ordering of the variables of p implies that y contains a variable V i such that i < j for every variable V j in x . Thus, |y| |w i | = ε −1 × |p| × |w i+1 | ε −1 × |x|, which contradicts the fact that the exponent of xyx is at least 1 + ε.
To obtain the infinite word w, we can insert our occurrence of p into a bi-infinite (1 + ε)-free word over an alphabet of ε −1 + 2 new letters. So w is an infinite (1 + ε)free word over an alphabet of v(p) ( ε −1 + 2) + 1 letters which contains an occurrence of f . By Lemma 10, every nice formula is avoidable since it is avoided by a Dejean word over a sufficiently large alphabet. Thus, if a formula is nice and minimally avoidable, then it is minimally nice. This is the case for every formula in the 3-avoidance basis, except AB.AC.BA.CA.CB. However, a minimally nice formula is not necessarily minimally avoidable. Indeed, we have shown [15] that the set of minimally nice ternary formulas consists of the nice formulas in the 3-avoidance basis, together with the minimally nice formulas in Table 1

Avoiding a nice formula
Recall that a nice formula f is such that AE(f ) > 1. We consider the smallest integer s such that RT (s) < AE(f ). Thus, every Dejean word over Σ s avoids f , which already gives λ(f ) s. Recall that a morphism is q-uniform if the image of every letter has length q. Also, a uniform morphism h : Σ * s → Σ * e is synchronizing if for any a, b, c ∈ Σ s and v, w ∈ Σ * e , if h(ab) = vh(c)w, then either v = ε and a = c or w = ε and b = c. For increasing values of q, we look for a q-uniform morphism h : Σ * s → Σ * e such that h(w) avoids f for every RT (s) + -free word w ∈ Σ s , where is given by Lemma 12 below. Recall that a word is (β + , n)-free if it contains no repetition with exponent strictly greater than β and period at least n. Lemma 12. [11] Let α, β ∈ Q, 1 < α < β < 2 and n ∈ N * . Let h : Σ * s → Σ * e be a synchronizing q-uniform morphism (with q 1). If h(w) is (β + , n)-free for every α + -free word w such that |w| < max 2β β−α , 2(q−1)(2β−1) , then h(w) is (β + , n)-free for every (finite or infinite) α + -free word w.
Given such a candidate morphism h, we use Lemma 12 to show that for every RT (s) +free word w ∈ Σ * s , the image h(w) is (β + , n)-free. The pair (β, n) is chosen such that RT (s) < β < AE(f ) and n is the smallest possible for the corresponding β. If β < AE(f ), then every occurrence h of f in a (β + , t)-free word is such that the length of the h-image of every variable of f is upper bounded by a function of n and f only. Thus, the h-image of every fragment of f has bounded length and we can check that f is avoided by inspecting a finite set of factors of words of the form h(w).

The number of fragments of a minimally avoidable formula
Interestingly, the notion of (minimally) nice formula is helpful in proving the following.
Theorem 13. The only minimally avoidable formula with exactly one fragment is AA.
Proof. A formula with one fragment is a doubled pattern. Since it is minimally avoidable, it is a minimally nice formula. By the properties of minimally nice formulas discussed above, the unique fragment of the formula is either AA or is of the form ApA such that p does not contain the variable A. Thus, p is a doubled pattern such that p ≺ ApA, which contradicts that ApA is minimally avoidable.
By contrast, the family of two-birds formulas, which consists of ABA.BAB, ABCBA.CBABC, ABCDCBA.DCBABCD, and so on, shows that there exist infinitely many minimally avoidable formulas with exactly two fragments. Every two-birds formula is nice. Let us check that every two-birds formula AB · · · X · · · BA.X · · · A · · · X is minimally avoidable. Since the two fragments play symmetric roles, it is sufficient to split on the first fragment. We obtain the formula AB · · · X · · · B.B · · · X · · · BA.X · · · A · · · X which divides the pattern B · · · X · · · BAB · · · X · · · B = Z(B · · · X · · · B). This pattern is equivalent to B · · · X · · · B, which is unavoidable. Thus, every two-birds formula is indeed minimally avoidable.

Characterization of some famous morphic words
Our next result gives characterizations of w 3 , up to renaming, that use just one formula. Then we give similar characterizations of b 3 and b 2 . Let σ = 1/2/0 be the morphism that cyclically permutes Σ 3 . Proof. Using Cassaigne's algorithm [4], we have checked that w 3 avoids f h . By divisibility, w 3 avoids f .
Let w be a ternary recurrent word avoiding f . By Lemma 6, w is square-free.
Let v = 210201202101201021. A computer check shows that no infinite ternary word avoids f e , squares, v, σ(v), and σ 2 (v). So, without loss of generality, w contains v. If w contains 121, then w contains the occurrence A → 1, B → 2, C → 0 of f e . Similarly, if w contains 212, then w contains the occurrence A → 2, B → 1, C → 0 of f e . Thus, w avoids squares, 121, and 212. By Theorem 2, w is equivalent to w 3 .
By symmetry, every ternary recurrent word avoiding f is equivalent to b 3 , σ(b 3 ), or Notice that Theorem 16 is a complement to [15,Theorem 2] in which we gave a disjoint set of formulas with the same property. The difference between Theorem 16 and [15,Theorem 2] is that a different occurrence of f shows that f divides Z n (AA).
Theorem 17. Let f h = AABCAA.BCB, f e = AABCAAB.AABCAB.AABCB, and let f be such that f h f f e . Every binary recurrent word avoiding f is equivalent to b 2 .
Proof. Using Cassaigne's algorithm [4], we have checked that Thus, every recurrent word avoiding f e also avoids AAA and ABABA, which means that it is overlap-free. Finally, it is well-known that every binary recurrent word that is overlap-free is equivalent to b 2 .

xyx-formulas
Recall that every fragment of an xyx-formula is of the form XY X. We associate to an xyx-formula F the directed graph − → G such that every variable corresponds to a vertex and − → G contains the arc − − → XY if and only if F contains the fragment XY X. We will also denote by G the underlying simple graph of − → G .
Lemma 18. Let F 1 and F 2 be xyx-formulas associated to − → Proof. Since both digraph homomorphism and formula divisibility are transitive relations, we only need to consider the following two cases. If G 1 is a subgraph of G 2 , then F 1 is obtained from F 2 by removing some fragments. So every occurrence of F 2 is also an occurrence of F 1 and thus F 1 F 2 . If G 2 is obtained from G 1 by identifying the vertices u and v, then F 2 is obtained from F 1 by identifying the variables U and V . So every occurrence of F 2 is also an occurrence of F 1 and thus F 1 F 2 .
For every i, let T i be the xyx-formula corresponding to the directed circuit − → C i of length i, that is, T 1 = AAA, T 2 = ABA.BAB, T 3 = ABA.BCB.CAC, T 4 = ABA.BCB.CDC.DAD, and so on. More formally, T i is the formula with i variables A 0 , . . ., A i−1 which contains the i fragments of length three of the form A j A j+1 A j such that the indices are taken modulo i. Notice that T i is a nice formula. Proof. We use Lemma 12 to show that the image of every (7/4 + )-free word over Σ 4 by the following 58-uniform morphism is (3/2, 3)-free.
In these words, the factor 010 is the only occurrence m of ABA such that |m(A)| |m(B)|. This implies that these ternary words avoid T i for every i 1, so that λ(T i ) 3.
To show that λ(T i ) 3, we consider the xyx-formula H = ABA.BAB.ACA.CBC associated to the directed graph − → D 3 on 3 vertices and 4 arcs that contains a circuit of length 2 and a circuit of length 3. Standard backtracking shows that λ(H) > 2, and even the stronger result that λ(ABAB.ACA.CAC.BCB.CBC) > 2.
For every i 2, the circuit − → C i admits a homomorphism to − → D 3 . By Lemma 18, this means that T i H, which implies that λ(T i ) λ(H) 3.
Theorem 20. For every i 1, b 4 avoids T i .
Proof. Suppose for contradiction that there exist i and n such that m n (0) contains an occurrence h of T i . Further assume that n is minimal. Notice that in b 4 , every even (resp. odd) letter appears only at even (resp. odd) positions. Thus, for every fragment XY X of T i , the period |h(XY )| of the repetition h(XY X) must be even. This implies that |h(X)| and |h(Y )| have the same parity. By contagion, the lengths of the images of all the variables of T i have the same parity. Now we proceed to a case analysis.
-Every h(X) starts with 0 or 2. By taking the pre-image by m of every h(X), we obtain an occurrence of T i that is contained in m n−1 (0). This contradicts the minimality of n.
-Every h(X) starts with 1 or 3. Notice that in b 4 , the letter 1 (resp. 3) is in position 1 (mod 4) (resp. 3 (mod 4)). m n (0) contains the occurrence h of T i such that h (X) is obtained from h(X) by adding to the rigth the letter 1 or 3 depending on its position modulo 4 and by removing the first letter. Since is also contained in m n (0) and every h (X) starts with 0 or 2, h satisfies the previous subcase.
• Every |h(X)| is odd. It is not hard to check that every factor uvu in b 4 with |v| = 1 satisfies v ∈ {1, 3} and u ∈ {0, 2}. So |h(X)| 3 for every variable X of T i . Let X 1 , · · · , X i be the variables of T i . Up to a shift of indices, we can assume that j and the first and last letters of h(X j ) have the same parity. We construct the occurrence h of T i as follows. If j is odd, then h (X j ) is obtained by removing the first letter of h(X j ). If j is even, then h (X j ) is obtained by adding to the right the letter 1 or 3 depending on its position modulo 4. Since h is also contained in m n (0) and every |h (X)| is even, h satisfies the previous case.
Our next result generalizes Theorems 5 and 20. Recall that every fragment of a hybrid formula has length 2 or is of the form XY X.
Theorem 21. Every avoidable hybrid formula is avoided by b 4 .
Proof. Let f be a hybrid formula. If f contains a locked formula or a formula T i , then b 4 avoids f by Theorems 4 and 20. If f contains neither a locked formula nor a formula T i , then we show that f is unavoidable. By induction and by theorem 1 it is sufficient to show that f is reducible to a hybrid formula containing neither a locked formula nor a formula T i . Since f is not locked, f contains a free set of variables and thus f has a free singleton {X}. If f contains a fragment Y XY , then {Y } is also a free singleton of f . Using this argument iteratively, we end up with a free singleton {Z} such that f contains no fragment T ZT , since f contains no formula T i .
So we can assume that f contains a free singleton {Z} and no fragment T ZT . Thus, deleting every occurrence of Z from f gives an hybrid sub-formula containing neither a locked formula nor a formula T i . By induction, f is unavoidable.
So the index of an avoidable xyx-formula is at most 4 and we have seen examples of xyx-formulas with index 3 in Theorems 15 and 19. The next results give an xyx-formula with index 4 and an xyx-formula with index 2 that is not divisible by AAA. Proof. We use again Cassaigne's algorithm.
6 Palindrome patterns Mikhailova [10] has considered the index of an avoidable pattern that is a palindrome and proved that it is at most 16. She actually constructed a morphic word over Σ 16 that avoids every avoidable palindrome pattern.
We make a distinction between the largest index P w of an avoidable palindrome pattern and the smallest alphabet size P s allowing an infinite word avoiding every avoidable palindrome pattern. We obtained [15] the lower bound λ(ABCADACBA) = λ(ABCA.ACBA) = 4, so that 4 P w P s 16.
The following result is a slight improvement to λ(ABCA.ACBA) = 4 that is not related to palindromes.