Avoidability index for binary patterns with reversal

For every pattern $p$ over the alphabet $\{x,y,x^R,y^R\}$, we specify the least $k$ such that $p$ is $k$-avoidable.


Introduction
The study of words avoiding patterns is a major theme in combinatorics on words, explored by Thue and others [14,2,17,13,5,10,11]. The reversal map is a basic notion in combinatorics on words, and it is therefore natural that recently work has been done on patterns with reversals by Shallit and others [7,8,15]. (More general ideas, such as patterns with involutions or other permutations, have also been studied very recently by the first author and others [12,4,6,3].) Shallit et al. [8] recently asked whether the number of binary words avoiding xxx R grows polynomially with length, or exponentially. The surprising answer by Currie and Rampersad [7] is 'Neither'. As B. Adamczewski [1] has observed, this implies that the language of binary words avoiding xxx R is not context-free -a result which has so far resisted proof by standard methods.
Basic questions about patterns with reversal have not yet been addressed. In this article, we completely characterize the k-avoidability of an arbitrary binary pattern with reversal. This is a direct (and natural) generalization of the work of Cassaigne [5] characterizing k-avoidability for binary patterns without reversal, and involves a blend of classical results and new constructions.

Preliminaries
For general concepts and notations involving combinatorics on words, we refer the reader to the work of Lothaire [10,11]. Let Σ be the alphabet Σ = {x, x R , y, y R }. We call a word p ∈ Σ * a binary pattern with reversal. For a positive integer k, let T k Date: August 24, 2015. 2000 Mathematics Subject Classification. 68R15. The first author was supported by an NSERC Discovery Grant, and also by Deutsche Forschungsgemeinschaft, which supported him through its Mercator program. The second author was supported through the NSERC USRA program.
(Note the two distinct usages of R : In Σ, the notation distinguishes pairs of alphabet letters; on T * k it stands for reversal.) We say that a morphism f : Thus any morphism from {x, y} * to T * k extends uniquely to a morphism on Σ respecting reversal. Let p be a binary pattern with reversal. An instance of p is the image of p under some non-erasing morphism which respects reversal. For example, an instance of p = xyyx R is a word XY Y X R , where X and Y are non-empty; this is the image of p under the non-erasing morphism respecting reversal given by f (x) = X, f (y) = Y . If pattern with reversal p does not contain either of x R and y R , then an instance of p is simply an instance of pattern p in the usual sense.
Let k be a positive integer. Let p be a binary pattern with reversal. A word w avoids p if no factor of w is an instance of p. Pattern p is k-avoidable if there are arbitrarily long words of T * k which avoid p; equivalently, there is an ω-word w over T k such that every finite prefix of w avoids p. If p is not k-avoidable, it is kunavoidable; note that every factor of a k-unavoidable word is k-unavoidable. Word p is avoidable if it is k-avoidable for some k; otherwise, p is unavoidable. If p is avoidable, then the avoidability index of p is defined to be the least k such that p is k-avoidable. If p is unavoidable, we define the unavoidability index of p to be ∞.

Classification
Consider the morphisms ι 1 , ι 2 on Σ * given by: Thus ι 1 switches x and x R , while ι 2 switches x and y, x R and y R . Thus ι 2 (ι 1 (ι 2 )) switches y and y R . One checks the following: Let ι 3 denote the reversal antimorphism on Σ * . Lemma 2. Let p be a binary pattern with reversal. If w is an instance of p, then w R is an instance of ι 3 (p).
For j = 1, 2, 3, ι 2 j is the identity morphism on Σ * . It follows that the relation on Σ * given by p ∼ q ⇐⇒ q is obtained from p by a sequence of applications of ι 1 , ι 2 and ι 3 is an equivalence relation. Combining the previous two lemmas gives the following: Lemma 3. Let k be a positive integer. Let p, q be binary patterns with reversal. Suppose that q ∼ p. Then p is k-avoidable if and only if q is k-avoidable.
Consider the lexicographic order on Σ * generated by x < x R < y < y R . If p ∈ Σ * , define ℓ(p) to be the lexicographically least element of the equivalence class of p under ∼ . For example, ℓ(x R yy) = xxy. Let ι j (C n (p)), n ≥ 1.
Since the ι j preserve length, for any pattern p only the finitely many words of Σ |p| can be equivalent to p. Thus for some positive integer m we will have C m+1 (p) = C m (p), and this C m (p) is the equivalence class of p under ∼, which we will denote by C(p). Let S 2 = {xxx, xxyxy R , xxyx R y, xxyxyy, xxyx R y R , xxyyx, xxyyx R , xx R , xyxxy, xyx R x R y, xyxyx, xyxyx R , xyx R yx, xyx R y R x, xyyx R , xyxy R x, xyxy R x R } and S 3 = {xx, xyxy, xyxy R , xyx R y R }. One checks that s = ℓ(s) for all s ∈ S 2 ∪ S 3 . The following theorems are proved in Sections 5 and 4, respectively.  We will prove the following: Theorem 6 (Main Theorem). Let p be a binary pattern with reversal. The avoidability index of p is 2, 3 or ∞.
In fact, we characterize exactly which of these patterns are 2-avoidable, 3-avoidable and unavoidable in the next two theorems.
Theorem 7. Let p be a binary pattern with reversal. If ℓ(p) is a prefix of one of xyx and xyx R , then p is unavoidable; otherwise p is 3-avoidable.
Proof. To begin with, we note that xyx and xyx R are unavoidable: If positive integer k is fixed, consider any word w over T k of length 2k + 1. Some letter a ∈ T k appears in w at least 3 times, and w has a factor aba where |b| a ≥ 1. Consider the morphism respecting reversal where f (x) = a, f (y) = b. Then f (xyx) = f (xyx R ) = aba, since a = a R . Thus w contains instances of xyx and xyx R ; since w was an arbitrary word over T k , patterns xyx and xyx R are not k-avoidable. Since k was arbitrary, they are unavoidable. A fortiori, their prefixes are unavoidable. Now suppose that p is 3-unavoidable.Without loss of generality, replace p by ℓ(p). The first letter of p is thus x. If |p| = 1 we are done. By Theorems 4 and 5, no factor of p is equivalent to xx or xx R ; the two-letter prefix of p is thus xy or xy R . Since p = ℓ(p), it follows that xy is a prefix of p. Therefore, if |p| = 2, we are done. Since yy and yy R are equivalent to xx and xx R respectively, the third letter of p must be x or x R , and one of xyx and xyx R is a prefix of p. If |p| ≤ 3, we are done. If |p| ≥ 4, then the fourth letter of p must be y or y R ; otherwise p ends in a word equivalent to xx or xx R . Now, however, the length 4 prefix of p is one of xyxy, xyxy R , xyx R y and xyx R y R . However, xyx R y cannot be a prefix of p, since ℓ(xyx R y) = xyxy R which is 3-avoidable by Theorem 5. The other possibilities are also 3-avoidable by Theorem 5. We conclude that |p| ≤ 3, and our proof is complete.
Theorem 8. Let p be a binary pattern with reversal. Then p is 2-avoidable if and only if ℓ(u) ∈ S 2 for some factor u of p.
Proof. By Theorem 4, if ℓ(u) ∈ S 2 for some factor u of p, then ℓ(u), hence u, hence p is 2-avoidable.
In the other direction, suppose that for all factors u of p, ℓ(u) ∈ S 2 . We show that p is 2-unavoidable. For each non-negative integer n, let A n be defined by If q is in A n , n > 0, write q ′ for the prefix of q of length n − 1. Then ℓ(q ′ ) ∈ A n−1 . Thus, q = ra, where r ∈ C(r), somer ∈ A n−1 , a ∈ Σ. This allows us to compute the A n : It follows that A n = φ, n ≥ 6.

Binary patterns with reversal that are 3-avoidable
In this section we will prove Theorem 5. A square is an instance of xx. It was shown by Thue [14] that squares are 3-avoidable. Any instance of xyxy is necessarily a square. Therefore, both xx and xyxy are 3-avoidable. To prove Theorem 5, it thus remains to show that xyxy R and xyx R y R are 3-avoidable.
Fraenkel and Simpson [9] constructed a binary sequence containing no squares other than 00, 11 and 0101. We will refer to this sequence as f. Theorem 9. Patterns xyxy R and xyx R y R are 3-avoidable.
Proof. From f, create a word g by replacing each factor 10 of f by 12220. Word g has the form g = 0 a 1 1 a 2 2 3 0 a 3 1 a 4 2 3 · · · where for each i, 1 ≤ a i ≤ 3, since neither of 0 4 = (00) 2 and 1 4 = (11) 2 can be a factor of f. In particular, g has no length 2 factor cd where c ≡ d + 1 (mod 3). Note also that word f never contains 1010, so that g never contains 220122201 or 012220122.
Suppose that xyxy R , (resp., xyx R y R ) is a factor of g. Then so is xyxy: Any factor z of g containing distinct letters has a factor dc where d ≡ c + 1 (mod 3); thus z R has a length 2 factor cd where c ≡ d + 1 (mod 3), so that z R cannot be a factor of g. Since both y and y R (resp., x, x R , y and y R ) are factors of g, then y (resp., x, y) must be a power of a single letter, so that y = y R (resp., x = x R , y = y R ).
Thus g has a factor xyxy, which is equivalent to having a factor xx with |x| ≥ 2. We show that g has no such factor: Suppose g has factor xx with |x| ≥ 2. Word x must contain 2 distinct letters, otherwise xx consists of a letter repeated four or more times, contradicting a i ≤ 3. This implies that all three of 0, 1, 2 appear in xx. Deleting 2's from xx leaves a square over {0, 1} containing both 0 and 1. This must be 0101. Then, adding the 2's back in, xx is a factor of 201222012; however the only square factor of 201222012 is 22, and |x| ≥ 2. This is a contradiction.
In conclusion, xyxy R , and xyx R y R are avoided by g, and are thus 3-avoidable.

Binary patterns with reversal that are 2-avoidable
In this section we will prove Theorem 4 using several new constructions as well as some known results. We partition S 2 into pieces according to the constructions used: Theorem 10. The words of S 2,1 are 2-avoidable.
Theorem 11. The sequence f of Fraenkel and Simpson avoids xyxyx R .
It follows that XY XY is a square of length at least 4; as there is only one such square in f, this forces X = 0, Y = 1. However, in this case f contains the factor Y XY X R = 1010, which is impossible.
To prove Theorem 4, it remains to show that the patterns of S 2,3 and S 2,4 are 2-avoidable. We do this in Sections 5.1 and 5.2, respectively. 5.1. Patterns in S 2,3 are 2-avoidable. We use here elementary notions of graph theory; in particular, a graph has a 2-colouring if and only if it has no odd cycles. A standard reference is by Wilson [16]. Let p be a binary pattern with reversal. We use the notation Define G(p) to be the graph with vertex set Σ, and an edge between a R and b whenever ab is a length two factor of p.
Example: If p = x R xyx R x R y, then the length two factors are x R x, xy, yx R , x R x R and x R y, giving rise to edges xx, x R y, y R x R , xx R and xy. The graph G(p) is shown in Figure 1. This graph contains odd cycles, for example, x-x, of length 1, and x-x R -y-x, of length 3. Proof. Let u and v be factors of (01) ω . Then uv is a factor of (01) ω exactly when u R and v begin with different letters. Suppose G(p) is bipartite, and let c : G(p) → {0, 1} be a legal colouring. Now let X be the shortest string beginning with c(x) and ending with c(x R ); thus X is a factor of (01) ω with 1 ≤ |X| ≤ 2. Similarly, let Y be the shortest string beginning with c(y) and ending with c(y R ). Define the morphism h : {x, y} → {0, 1} * by h(x) = X, h(y) = Y . If a ∈ {x, x R , y, y R }, then h(a) begins with c(a) and ends in c(a R ). Suppose ab is a length two factor of p. Then a R b is an edge of G(p), and c(a R ) = c(b) so that (h(a)) R and h(b) begin with different letters. It follows that h(ab) is a factor of (01) ω . By induction, we see that h(p) is a factor of (01) ω .
In the other direction, suppose that h : {x, y} → {0, 1} * is a morphism such that h(p) is a factor of (01) ω . For a ∈ {x, x R , y, y R }, define the 2-colouring c by choosing c(a) to be the first letter of h(a). If this is not a legal colouring, then for some letters a, b ∈ {x, x R , y, y R }, there is an edge ab in G(p) with c(a) = c(b). This implies that a R b is a factor of p, but h(a) and h(b) start with the same letter. Then h(a R ) ends with the same letter that begins h(b), forcing 00 or 11 to be a factor of h(p), which is in turn a factor of (01) ω . This is impossible.
Proof. Graph G(xx R ) contains the loop x R -x R , i.e., a 1-cycle. For each of the other patterns p ∈ S 2,2 , G(p) contains a triangle.
Suppose f is any non-erasing binary morphism such that f (0) = 0. Let w = f (t).

Lemma 15.
Let v be a factor of t of odd length. Then v is a factor of h(v ′ ) for some factor v ′ of t of length (|v| + 1)/2.

Proof. Omitted.
Corollary 16. Every factor of t of length 2 n + 1 is a factor of the prefix of t of length 7(2 n ).
Proof. All length two binary words are factors of 0110100, the length 7 prefix of t. The result follows by applying the previous lemma n times.
Theorem 17. The sequence w 1 avoids xyxy R x R .
Proof. Let z be a factor of w 1 such that z R is also a factor of w 1 . We claim that |z| ≤ 6. Otherwise, replacing z by its length 7 prefix, w 1 has a length 7 factor z such that z R is also a factor of w 1 . we note that |f 1 (1)| = 11, so that by Lemma 14, z is a factor of f 1 (v), some factor v of t where |v| ≤ 2 11 + 1 (7 + 3(11) − 3) < 7.
Certainly then an extension of v is a factor of t of length 9 = 2 3 + 1, so that by Corollary 16, v is a factor of the prefix of t of length 56. This implies that z and z R are factors of f 1 (τ ), where τ is the prefix of t of length 56. A search shows that f 1 (τ ) has no length 7 factor z such that z R is also a factor of f 1 (τ ). Suppose that XY XY R X R is a factor of w 1 with X, Y = ǫ. Since both X and X R , and both Y and Y R are factors of w 1 , it follows that |X|, |Y | ≤ 6, and By Corollary 16, v ′ is a factor of the prefix of t of length 112, so that XY XY R X R is a factor of f 1 (τ ′ ), where τ ′ is the prefix of t of length 112. However, a search shows that f 1 (τ ′ ) has no factor XY XY R X R with |X|, |Y | ≤ 6. We conclude that w 1 avoids xyxy R x R .
Next, we give an infinite binary word that avoids the pattern xyx R y R x. Let f 2 be the binary morphism with f 2 (0) = 0, and f 2 (1) = 00101111. Let w 2 = f 2 (t).
Theorem 18. The sequence w 2 avoids xyx R y R x.
Proof. Suppose XY X R Y R X is a factor of w 2 , X, Y = ǫ. As in the proof of the previous lemma, we find that |X|, |Y | ≤ 6, so that XY X R Y R X is a factor of f 2 (τ ′ ), where τ ′ is the prefix of t of length 112. However, a search shows that f 2 (τ ′ ) has no such factor. Proof. This is established by finite search, using Lemma 14 and Corollary 16.
Lemma 20. Suppose that X and Y are words such that XY , XY R , Y X and Y X R are all factors of w 3 , and |X| ≥ 3. Then Y ∈ {0, 1, 00}.
Let u be a factor of w 3 . Define a left completion of u to be a word v = f 3 (t), such that u is a suffix of f 3 (t), but u is a not a suffix of any proper suffix of v of the form f 3 (t ′ ). Thus, for example, 001011 is a left completion of 11 and of 011, but 01 has no left completion.
Lemma 21. Let u be a factor of w 3 which ends in 11. Then u has a unique left completion.

Proof. Induction.
We remark that if p is a prefix of w 3 and 11 is a suffix of p, then p = f 3 (t) for some prefix t of t.
Proof. A finite search shows that w 3 contains no factor XY XY R X with 1 ≤ |X| ≤ 8 and 1 ≤ |Y | ≤ 2. Suppose that, nevertheless, w 3 contains some factor XY XY R X, |X|, |Y | = 0. By Lemma 20, Y ∈ {0, 00, 1}, so that |X| ≥ 9. This means that Y = Y R , so it will be notationally simpler to write XY XY X for XY XY R X. It is easy to show that (or alternatively, by a finite search, invoking Lemma 14 and Corollary 16) any factor χ of w 3 with |χ| = 9 contains the factor 11. Therefore, write X = X ′ X ′′ , where 11 is a suffix of X ′ , and |X ′′ | 11 = 0. Let of w 3 implies the existence of the overlap t 4 t 5 t 4 t 5 t 4 in t, contradicting the overlapfreeness of t.
Corollary 24. Let py be a prefix of w 4 . Suppose that y = f 4 (t) for some factort of t where |t| 1 > 0. Then p = f 4 (τ ) for some prefix τ of t.
Corollary 25. Let py be a prefix of w 4 . Suppose that y = f 4 (t) for some factort of t, and |y| ≥ 6. Then p = f 4 (τ ) for some prefix τ of t.
Recall that a factor y of w 4 is bispecial if 0y, 1y, y0, y1 are all factors of w 4 .
Lemma 26. Let y be a bispecial factor of w 4 with |y| ≥ 6. Then y = f 4 (t) for some factor t of t.
Proof. Either y is an internal factor of f 4 (1), or y can be written as y = sf 4 (t)p where t is a factor of t, s is a suffix of f 4 (1), p is a prefix of f 4 (1), and |s|, |p| < |f 4 (1)|. Now the internal factors of f 4 (1) of length at least 6 are 000010, 000100, 001001, 0000100, 0001001 and 00001001. Each of these only occurs in w 4 inside a copy of f 4 (1); therefore, none of these are bispecial (or even right special or left special).
Suppose s = 1. Word t begins with 00, 01 or 1, or else t = ǫ. Thus one of 1|0|0|100, 1|0|1000 or 1|10000 is a prefix of y; here the vertical bars mark the divisions in w 4 between f 4 -images of letters. In each case, we see that y must be preceded by 1 in w 4 , contradicting the assumption that y is bispecial. If s = 11, then one of 11|0|0|10, 11|0|100 or 11|1000 is a prefix of y. In each case y is always preceded by 0, again contradicting the assumption that y is bispecial. If |s| ≥ 3, then s ends in 011; however, 011 only arises in w 4 as a suffix of f 4 (1), so that the letter preceding s (and thus y) in w 4 must always be the letter preceding s in f 4 (1). The cases where |s| > 0 therefore all lead to a contradiction. We conclude that s = ǫ.
If p = 1, then one of 011|0|0|1, 0011|0|1 and 10011|1 is a suffix of y. This implies that y is always followed in w 4 by 0, a contradiction, since y is bispecial. If p = 10, then one of 11|0|0|10, 011|0|10 and 0011|10 is a suffix of y, and y is always followed by 0 in w 4 . If p = 100, then y has a suffix 1|0|0|100, 11|0|100 or 011|100, and y is always followed by a 0. If |p| ≥ 4, then p begins 1000, which only arises in w 4 as a prefix of f 4 (1), so that the letter following p (and thus y) in w 4 must always be the letter following p in f 4 (1). We conclude that p = ǫ.
Since p = s = ǫ, y = f 4 (t) for some factor t of t, as claimed.
Corollary 27. Let y be a bispecial factor of w 4 with |y| ≥ 6. Let py be a prefix of w 4 . Then p = f 4 (τ ) for some prefix τ of t and y = f 4 (t) for some factor t of t.
Theorem 28. The word w 4 avoids xyx R yx.
Proof. Suppose not. Let u be a factor of w 4 of the form u = XY X R Y X, with X, Y = ǫ, and such that u is as short as possible. Both X and X R are factors of w 4 . By Lemma 14, each length 21 factor of w 4 will be a factor of f 4 (v), for some appropriate length 8 factor v of t. By Corollary 16, every length 8 factor of t appears in the length 56 prefix of t. We can therefore effectively list all length 21 factors of w 4 . One verifies that if z is a length 21 factor of w 4 , then z R is not a factor. Thus, since both X and X R are factors of w 4 , |X| ≤ 20.
Subcase 1: |Y | ≤ 5. In this case, |XY X R Y X| ≤ 70, and by Lemma 14, XY X R Y X is a factor of f 4 (v), for some factor v of t of length 17. The length 17 factors of t all lie in the length 112 suffix h 4 (0110100) of t, and a finite search shows that no factor XY X R Y X occurs in f 4 (h 4 (0110100)). This case therefore cannot occur.
Subcase 2: |Y | ≥ 6. Subcase 2a: The first and last letters of X are different. In this case, write X = aX ′ b where a, b ∈ {0, 1}, a = b. Then XY X R Y X = aX ′ bY b(X ′ ) R aY aX ′ b, and we see that Y is bispecial. Let πXY X R Y X be a prefix of w 4 . Applying Corollary 27 several times, we see that πX = f 4 (t 0 ), πXY = f 4 (t 1 ), πXY X R = f 4 (t 2 ) and πXY X R Y = f 4 (t 3 ) for some prefixes t 0 , t 1 , t 2 and t 3 of t. It follows that X R = f 4 (t), where t is the factor (t 1 ) −1 t 2 of t. If the last letter of X is a 1, then πX = f 4 (t 0 ) must have suffix 11; this implies that 11 is a prefix of X R = f 4 (t), which is impossible. However, if the last letter of X is a 0, then the first letter of X is a 1. Thus the last letter of X R is a 1, and 11 is a suffix of X R = f 4 (t). Then 11 is a prefix of X, and hence of (πXY X R Y ) −1 w 4 = f 4 (t −1 3 t), which is also impossible. Subcase 2b: The first and last letters of X are the same. Write X = aχa where a ∈ {0, 1}. Then u = aχaY aχ R aY aχa has the proper factor χΥχ R Υχ, where

Conclusion/Further Discussion
We note that in 1992, Roth [13] proved that every length six binary pattern is 2-avoidable. Our Theorem 4 shows that this is also true for binary patterns with reversal.
It would be nice now to perhaps see if our results could be generalized to ternary patterns or beyond. Another natural desiridatum would be an effective characterization of which patterns with reversal are avoidable.