Extremal Square-free Words

A word is \emph{square-free} if it does not contain non-empty factors of the form $XX$. In 1906 Thue proved that there exist arbitrarily long square-free words over $3$-letter alphabet. We consider a new type of square-free words. A square-free word is \emph{extremal} if it cannot be extended to a new square-free word by inserting a single letter on arbitrary position. We prove that there exist infinitely many extremal words over $3$-letter alphabet. Some parts of our construction relies on computer verifications. We also pose some related open problems.

In this paper we propose a new problem of extremal nature in this area. Let A be a fixed alphabet and let W be a finite word over A. An extension of W is any word of the form W xW , where x ∈ A and W = W W . A square-free word W is called extremal over A if there is no square-free extension of W . For instance, the word H = abcabacbcabcbabcabacbcabc is the shortest extremal word over alphabet A = {a, b, c}. Our main result asserts that there exist infinitely many such words. Theorem 1. There exist infinitely many extremal square-free words over a 3-letter alphabet.
The proof is by recursive construction whose validity is partially based on computer verifications. We will give it in section 2. In the final section we state some open problems.

Proof of the main result
We start with a general result on which our construction is based. Consider a finite directed graph D on the set of vertices V = {v 1 , v 2 , . . . , v n }. Suppose that each vertex v i is labeled with some word B i = f (v i ) over a fixed alphabet A. We will refer to these words B i as blocks.
A walk in D is any sequence W = w 1 w 2 · · · w t , with w i ∈ V , such that (w i , w i+1 ) is a directed edge of D for every i = 1, 2, . . . , t − 1. Every walk W = w 1 w 2 · · · w t generates in a natural way the word f (W ) = f (w 1 )f (w 2 ) · · · f (w t ) over alphabet A by concatenating blocks corresponding to consecutive vertices w i in W . More formally, one may consider f as a homomorphism from the monoid V * to the monoid A * defined by the substitution A walk is square-free if it is a square-free word over alphabet V . We say that a digraph D is a Thue digraph if for every square-free walk W , the word f (W ) is also square-free (as a word over A). Let S(D) denote the set of all words over A derived as images of any square-free walks in D. So, a digraph D is a Thue digraph if S(D) contains only square-free words. The result below gives sufficient conditions for this property.
Theorem 2. Let D be a digraph on the set of vertices V = {v 1 , v 2 , . . . , v n } labeled with some blocks B i = f (v i ) over alphabet A. Then D is a Thue digraph if the following conditions are satisfied: (1) For every square-free walk W = w 1 w 2 w 3 , the word f (W ) is also square-free.
(2) No block B i is a factor of another block B j (unless i = j). In particular, blocks B i are pairwise different.
(3) For every pair of distinct blocks B i and B j , i = j, and any factorizations B i = XX and B j = Y Y , none of the words XY nor X Y can be equal to any block B k , unless the electronic journal of combinatorics 27(1) (2020), #P1.48 Proof. Suppose on the contrary that a square XX appears in some word f (W ), where W = w 1 w 2 · · · w t is a square-free walk in D. Assume also that W is a shortest such walk. So, we may write (see Figure 1): where f (w 1 ) = P P ,f (w j+1 ) = QQ , f (w t ) = RR , and By condition (1), the walk W has at least four vertices, hence, at least one part of the square must contain a full occurrence of some block. With no loss of generality we may assume that this happens in the left part. Also we may assume that this part contains as many blocks as the other part.
Let q ∈ {1, 2, . . . , j} be the smallest index such that w q = w j+q . There must be at least one such index since otherwise the walk W would contain the square w 1 w 2 · · · w j w 1 w 2 · · · w j , contradicting our assumption. We distinguish two cases.
If q > 1, then either f (w q ) is a prefix of f (w j+q ) or the other way around, which contradicts condition (2). If q = 1, then f (w 1 ) = f (w j+1 ) and we consider two cases. First suppose that the words P and Q have different lengths, and assume that P is longer than Q . Then we may write P = Q X , where X is a nonempty suffix of the block f (w 1 ) (see Figure 2). Now, the the electronic journal of combinatorics 27(1) (2020), #P1.48 block f (w j+2 ) must end before f (w 2 ) since otherwise f (w 2 ) would be contained in f (w j+2 ), contradicting condition (2). So, we may write f (w j+2 ) = X Y , where f (w 2 ) = Y Y . This contradicts condition (3). If Q is longer than P , then the reasoning is similar.
Suppose now that the words P and Q have equal length, which means that P = Q . This implies that all pairs of corresponding inner blocks in the left and the right part of the square XX must be equal (otherwise one of them would be included in the other, contradicting condition (2) (see Figure 3)). This implies that t = 2j + 1 and w i = w j+i for all i = 2, 3, . . . , j, and the walk W can be written as In consequence, we get that also Q = R (see Figure 3), which means that In both cases we get that w j+1 = w 1 or w j+1 = w t which gives a square in the walk W . This completes the proof. Using the above result we may now prove Theorem 1. First we will construct a Thue digraph on the set of 12 vertices together with the set of 12 blocks defined as follows. Consider the following square-free word This word is nearly extremal in the following sense. A square-free word W is called nearly extremal if it has at most two square-free extensions, and these extensions may have only the form xW or W y, where x, y ∈ A. The following lemma can be verified by a computer. It is clear that each word obtained from N by a permutation of the alphabet and by reversal is also nearly extremal. Let us denote the six words corresponding to the six permutations of the alphabet as: where indices denote nontrivial cycles of these permutations. Let us also denote reversals of the above six words by:Ñ ,Ñ ab ,Ñ ac ,Ñ bc ,Ñ abc ,Ñ acb . Now we may define a digraph D N as depicted in Figure 4. Its vertices are labeled by the above 12 nearly extremal words. It can be checked that for each directed edge of D N the corresponding concatenation of blocks gives a square-free word. Moreover, the following lemma was verified by a computer.  Proof. By Lemma 4, the set S(D N ) consists only of square-free words. Moreover, every word in S(D N ) is a concatenation of blocks that are nearly extremal words. Thus it cannot be extended at any inner position of a block. By the same reason it cannot be extended by inserting a letter between any two blocks. Indeed, for any two consecutive blocks B 1 B 2 occurring in some word from S(D N ) there is only one letter x such that B 1 x is square-free, and this letter must be the first letter of the next block B 2 . Hence, for any letter y, the word B 1 yx must contain a square.
the electronic journal of combinatorics 27(1) (2020), #P1.48 We are going to prove that the set S(D N ) is infinite. We will need the following general lemmas. Lemma 6. Let T = a 1 a 2 · · · a s be a square free word over alphabet A = {1, 2, 3}, and let V 1 , V 2 , V 3 be three pairwise disjoint alphabets. Then any word of the form W = W 1 W 2 · · · W s is square-free, where W i is any word over alphabet V a i consisting of pairwise distinct letters.
Proof. Indeed, the word W can be seen as an image of T in a multi-substitution, where for each letter i ∈ A we may put any word over V i with pairwise distinct letters. Such substitutions obviously preserve square-freenes since alphabets V i are pairwise disjoint.
Lemma 7. Let D be a digraph whose vertices can be partitioned into three sets V 1 , V 2 , and V 3 so that the following property holds: ( * ) For every pair i, j ∈ {1, 2, 3} and any vertex v ∈ V i , there is a directed path P = u 1 u 2 · · · u t such that Then there exists arbitrarily long square-free walks in D.
Proof. Let us take any square-free word T = a 1 a 2 · · · a s over the alphabet A = {1, 2, 3}.
Let v = v 1 be any vertex in V a 1 . Let P 1 be a directed path satisfying condition ( * ), starting at v 1 and ending at some vertex v 2 ∈ V a 2 . Now take a similar path P 2 starting from v 2 and ending at some vertex v 3 ∈ V a 3 . And so on, until we arrive to some vertex v s ∈ V as . In this way, we obtain a walk W = P 1 P 2 · · · P s , where P i = P i − {v i+1 } and P s = v s . By Lemma 6, the walk W is square-free.
Lemma 8. There exist arbitrarily long square-free walks in the digraph D N starting and ending at the vertex labeled with the word N .
Proof. It is not hard to check that the following partition of V (D N ) satisfies condition ( * ) of Lemma 7 (see Figure 5): So, by Lemma 7, there exist arbitrarily long square-free walks in D N . We need to show that they may start and end at the vertex N . To see this take a sufficiently long squarefree word T over the alphabet {1, 2, 3} of the form T = 1U 1 such that the word T = T 231 is also square-free. Now, we may construct a square free walk W along T like in the proof of Lemma 7, starting from the vertex N and ending at some vertex v in V 1 . If v = N we are done. If not, then we need to extend the walk W slightly. If v = N bc , then we make just one step to reach N directly. If v =Ñ bc , then we have to go first to N abc in V 2 , next toÑ ac in V 3 , and then jump to N from there (see Figure 5). This gives a square-free walk, since T is square-free. Finally, if v =Ñ , then we move first toÑ bc and then repeat the previous three steps from there.
Corollary 9. There exist infinitely many nearly extremal square-free words over a 3-letter alphabet.
To prove the assertion of Theorem 1 we need to modify slightly the digraph D N . The idea is to use two special words: The following lemma can be checked by a computer.
Lemma 10. The words QN and N R are square-free and each have only one square-free extension, namely QN a and cN R.
Now we may add two new vertices to our digraph D N with labels Q and R, and join Q to N and N to R by directed edges. Denote this modified digraph as D * N . The following lemma can also be verified by a computer.
Lemma 11. The digraph D * N is a Thue digraph. To construct extremal words of length exceeding any given constant it is enough to take a sufficiently long square-free walk W in D * N starting at Q and ending in R. Such a walk exists by Lemma 8 and Lemma 11. The word E = f (W ) corresponding to this walk will have the form E = QN Y N R, where N Y N is a nearly extremal word by Corollary 5. Hence, by Lemma 10, the word E is extremal. This completes the proof of Theorem 1.

Discussion
A natural question is whether an analogue of Theorem 1 holds for larger alphabets. Notice that if we have four letters at a disposal, then there are two potential possibilities for extension of a square-free word at every inner position. Actually, our computer experiment failed in finding extremal words over four letters of length up to 100. Perhaps there are no such words at all. On the other hand, it is known [2] that every square-free word (over any alphabet) is a prefix of a maximal square-free word, that is, a word non-extendable by attaching a single letter at the beginning or at the end.
Conjecture 12. There are no extremal square-free words over 4-letter alphabet.
The conjecture states, in other words, that every square-free word over four letters can be extended to a new square-free word by inserting a single letter at some position.
The concept of extremal words can be considered for any fixed pattern (or even for any property of words which is monotone on factors). To state a general conjecture on extremal words we recall briefly basic notions of pattern avoidance (see [2,5,8]).
Let V be an alphabet of variables. A pattern P = p 1 p 2 . . . p r , with p i ∈ V, is any nonempty word over V. A word W realizes a pattern P if it can be split into nonempty factors W = W 1 W 2 . . . W r so that W i = W j if and only if p i = p j , for all i, j = 1, 2, . . . , r. A word W avoids a pattern P if no factor of W realizes P . For instance, a square-free word avoids a pattern P = xx. A pattern P is avoidable if there exist arbitrarily long words over some finite alphabet avoiding P . A complete characterizations of avoidable patterns was provided independently by Zimin [10] and Bean, Ehrenfeucht and McNulty [2]. Now, given a fixed pattern P , we may define extremal P -avoiding words analogously as in the case of squares. The following conjecture seems plausible.
Conjecture 13. For every avoidable pattern P there exists a constant k = k(P ) such that the set of extremal P -avoiding words over k-letter alphabet is finite.
We conclude the paper with another related question. Consider the following greedy way of generating square-free words. Given a fixed ordered alphabet A, we start with the first letter from A and continue by inserting at the rightmost position of the actual word the earliest possible letter so that the new word is square-free. For instance, for the alphabet A = {1, 2, 3} this greedy procedure starts with the following sequence of square-free words: 1, 12, 121, 1213, 12131, 121312, 1213121, 12131231.
The last word was obtained by inserting 2 at the penultimate position of the previous word.
We conjecture that the above procedure never stops. To state it formally, let us define recursively a sequence of nonchalant words G i over the alphabet A k = {1, 2, . . . , k} by putting G 1 = 1, and letting G i+1 = G i xG i to be a square-free extension of G i such that G i is the shortest possible suffix of G i and x ∈ A k is the earliest possible letter.
the electronic journal of combinatorics 27(1) (2020), #P1.48 Conjecture 14. The sequence of nonchalant words over A k is infinite for every k 3.
The results of a computer experiment for k = 3 supports this conjecture; a nonchalant word of length 5000 was obtained by the above greedy procedure. Moreover, the algorithm never moved back by more that 15 positions. Therefore the following conjecture seems plausible.
Conjecture 15. The sequence of nonchalant words over A k converges to an infinite word for every k 3.
Here are the first 70 terms of the presumably infinite limit word for k = 3: 1213123132123121312313231213123212312131231321231213123212312132123132 . . .