Minimal automaton for multiplying and translating the Thue-Morse set

The Thue-Morse set $\mathcal{T}$ is the set of those non-negative integers whose binary expansions have an even number of $1$. The name of this set comes from the fact that its characteristic sequence is given by the famous Thue-Morse word ${\tt abbabaabbaababba\cdots}$, which is the fixed point starting with ${\tt a}$ of the word morphism ${\tt a\mapsto ab,b\mapsto ba}$. The numbers in $\mathcal{T}$ are commonly called the {\em evil numbers}. We obtain an exact formula for the state complexity of the set $m\mathcal{T}+r$ (i.e.\ the number of states of its minimal automaton) with respect to any base $b$ which is a power of $2$. Our proof is constructive and we are able to explicitly provide the minimal automaton of the language of all $2^p$-expansions of the set of integers $m\mathcal{T}+r$ for any positive integers $p$ and $m$ and any remainder $r\in\{0,\ldots,m-1\}$. The proposed method is general for any $b$-recognizable set of integers. As an application, we obtain a decision procedure running in quadratic time for the problem of deciding whether a given $2^p$-recognizable set is equal to a set of the form $m\mathcal{T}+r$.

In particular, as mentioned above, these sets have been characterized in terms of logic. More precisely, a subset of N (and more generally of N d ) is b-recognizable if and only if it is definable by a first-order formula of the structure N, +, V b where V b is the base-dependent functional predicate that associates with a natural n the highest power of b dividing n. Since the finite unions of arithmetic progressions are precisely the subsets of N that are definable by first order formulas in the Presburger arithmetic N, + , this characterization provides us with a logical interpretation of Cobham's theorem. In addition, this result turned out to be a powerful tool for showing that many properties of b-automatic sequences are decidable and, further, that many enumeration problems of b-automatic sequences can be described by b-regular sequences in the sense of Allouche and Shallit [5,7,18].
In the context of Cobham's theorem, the following question is natural and has received constant attention during the last 30 years: given an automaton accepting the language of the base-b expansions of a set X ⊆ N, is it decidable whether X is a finite union of arithmetic progressions? Several authors gave decision procedures for this problem [4,11,25,26,28]. Moreover, a multidimensional version of this problem was shown to be decidable in a beautiful way based on logical methods [11,33].
With any set of integers X is naturally associated an infinite word, which is its characteristic sequence χ X : n → 1 if n ∈ X, n → 0 otherwise. Thus, to a finite union of arithmetic progressions corresponds an ultimately periodic infinite word. Therefore, the HD0L ultimate periodicity problem consisting in deciding whether a given morphic word (i.e. the image under a coding of the fixed point of a morphism) is ultimately periodic is a generalization of the periodicity problem for b-recognizable sets mentioned in the previous paragraph. The HD0L ultimate periodicity problem was shown to be decidable in its full generality [21,31]. The proofs rely on return words, primitive substitutions or evolution of Rauzy graphs. However, these methods do not provide algorithms that could be easily implemented and the corresponding time complexity is very high. In addition, they do not allow us to obtain an algorithm for the multidimensional generalization of the periodicity problem, i.e. the problem of deciding whether a b-recognizable subset of N d is definable within the Presburger arithmetic N, + . Therefore, a better understanding of the inner structure of automata arising from number systems remains a powerful tool to obtain efficient decision procedures.
The general idea is as follows. Suppose that L = {L i : i ∈ N} is a collection of languages and that we want to decide whether some particular language L belongs to L. Now, suppose that we are able to explicitly give a lower bound on the state complexities of the languages in L, i.e. for each given N , we can effectively produce a bound B(N ) such that for all i > B(N ), the state complexity of L i is greater than N . Then the announced problem is decidable: if N is the state complexity of the given language L, then only the finitely many languages L 0 , . . . , L B(N ) have to be compared with L.
The state complexity of a b-recognizable set (i.e. the number of states of the minimal automaton accepting the b-expansions of its elements) is closely related to the length of the logical formula describing this set. Short formulas are crucial in order to produce efficient mechanical proofs by using for example the Walnut software [32,37]. There are several ways to improve the previous decision procedure. One of them is to use precise knowledge of the structure of the involved automata. This idea was successfully used in the papers [8,28]. In [17], the structure of automata accepting the greedy expansions of mN for a wide class of non-standard numeration systems, and in particular, estimations of the state complexity of mN are given. Another way of improving this procedure is to have at our disposal the exact state complexities of the languages in L. Finding an exact formula is a much more difficult problem than finding good estimates. However, some results in this direction are known. For instance, it is proved in [17] that for the Zeckendorf numeration system (i.e. based on the Fibonacci numbers), the state complexity of mN is exactly 2m 2 . A complete description of the minimal automaton recognizing mN in any integer base b was given in [1] and the state complexity of mN with respect to the base b is shown to be exactly m gcd (m where N is the smallest integer α such that m−b α gcd(m,b α ) < m gcd(m,b α+1 ) . For all the above mentioned reasons, the study of the state complexity of b-recognizable sets deserves special interest. In the present work, we propose ourselves to initiate a study of the state complexity of sets of the form mX + r, for any recognizable subset X of N (with respect to a given numeration system), any multiple m and any remainder r. In doing so, we aim at generalizing the previous framework concerning the case X = N only. Our study starts with the Thue-Morse set T of the so-called evil numbers [2], i.e. the natural numbers whose base-2 expansions contain an even number of occurrences of the digit 1. The characteristic sequence of this set corresponds to the ubiquitous Thue-Morse word 0110100110010110 · · ·, which is the fixed point starting with 0 of the morphism 0 → 01, 1 → 10. This infinite word is one of the archetypical aperiodic automatic words, see the surveys [6,34]. Many number-theoretic works devoted to sets of integers defined thanks to the Thue-Morse word exist, such as the study of additive and multiplicative properties, or iterations and sums of such sets [3,12,30]. In this vein, the set T seems to be a natural candidate to start with. The goal of this work is to provide a complete characterization of the minimal automata recognizing the sets mT + r for any multiple m and remainder r, and any base b which is a power of 2 (other bases are not relevant with the choice of the Thue-Morse set in view of Cobham's theorem). A previous work dealing with the case r = 0 recently appeared as a conference paper [14]. Surprisingly, the techniques of the present work, in particular the description of the left quotients (i.e. the states of the minimal automaton), are quite different from those we developed for the case r = 0. This paper has the following organization. In Section 2, we recall the background that is necessary to tackle our problem. In Section 3, we state our main result and present the method that will be carried out for its proof. More precisely, we present the steps of our construction of the minimal automaton accepting the base-2 p expansions of the elements of mT + r for any positive integers p and m, and any remainder r ∈ {0, . . . , m−1}. In Section 4, we give the details of the construction of the intermediate automata. In particular, we study the transitions of each automaton. Thus, at the end of Section 4, we are provided with an automaton recognizing the desired language. Then in Section 5, we study the properties of the built automata that will be needed for proving the announced state complexity result. The minimization procedure of the last automaton is handled in Section 6. This part is the most technical one and it deeply relies on the properties of the intermediate automata proved in the previous sections. In Section 7, we explicitly give the correspondence between the description of the minimal automaton recognizing mT obtained in [14] and that given in the present work in the particular case where r = 0. In Section 8, we show that the minimal automaton recognizing mT + r, where T is the complement of the Thue-Morse set T , is obtained directly from the one recognizing mT + r by moving the initial state. As a consequence, the state complexities of mT + r and mT + r coincide. Finally, in Section 9, we discuss future work and give two related open problems.

Basics
In this text, we use the usual definitions and notation (alphabet, letter, word, language, free monoid, automaton, etc.) of formal language theory; for example, see [27,36].
Nevertheless, let us give a few definitions and properties that will be central in this work. The empty word is denoted by ε. For a finite word w, |w| designates its length and |w| a the number of occurrences of the letter a in w. A regular language is a language which is accepted by a finite automaton. For L ⊆ A * and w ∈ A * , the (left) quotient of L by w is the language w −1 L = {u ∈ A * : wu ∈ L}.
As is well known, a language L over an alphabet A is regular if and only if it has finitely many quotients, that is, the set of languages is finite. The state complexity of a regular language is the number of its quotients: . It corresponds to the number of states of its minimal automaton. The following characterization of minimal automata will be used several times in this work: a deterministic finite automaton (or DFA for short) is minimal if and only if it is complete, reduced and accessible. A DFA is said to be complete if the transition function is total (i.e. transitions labeled with all possible letters start from every state), reduced if languages accepted from distinct states are distinct and accessible if every state can be reached from the initial state. The language accepted from a state q is denoted by L q . Thus, the language accepted by a DFA is the language accepted from its initial state.
In what follows we will need a notion that is somewhat stronger than that of reduced DFAs. We say that a DFA has disjoint states if the languages accepted from distinct states are disjoint: for distinct states p and q, we have L p ∩ L q = ∅. A state q is said to be coaccessible if L q = ∅ and, by extension, an automaton is coaccessible if all its states are coaccessible. Thus, any coaccessible DFA having disjoint states is reduced. Now, let us give some background on numeration systems. Let b ∈ N 2 . We define A b to be the alphabet {0, . . . , b−1}. Elements of A b are called digits. The number b is called the base of the numeration. In what follows we will make no distinction between a digit c in A b and its value c in [[0, b−1]]. Otherwise stated, we identify the alphabet A b and the interval of integers [[0, b−1]]. Note that here and throughout the text, we use the notation [[m, n]] to designate the interval of integers {m, m + 1, . . . , n}. The b-expansion of a positive integer n, which is denoted by rep b (n), is the finite word c · · · c 0 over A b defined by The b-expansion of 0 is the empty word: Clearly, the function val b • rep b is the identity from N to N. Moreover, for any w ∈ A * b , the words rep b (val b (w)) and w only differ by the potential leading zeroes in w. Also note that for all subsets X of N, In what follows, we will always consider automata accepting val −1 b (X) instead of rep b (X). The state complexity of a b-recognizable subset X of N with respect to the base b is the state complexity of the language val −1 b (X). Note that the b-expansions are read from left to right, i.e. most significant digit first.
We will need to represent not only natural numbers, but also pairs of natural numbers. If u = u 1 · · · u n ∈ A * and v = v 1 · · · v n ∈ B * are words of the same length n, then we use the notation (u, v) to designate the word (u 1 , v 1 ) · · · (u n , v n ) of length n over the alphabet For (m, n) ∈ N 2 , we write Otherwise stated, we add leading zeroes to the shortest expansion (if any) in order to obtain two words of the same length. Finally, for a subset X of N 2 , we write

Method
The Thue-Morse set, which we denote by T , is the set of all natural numbers whose base-2 expansions contain an even number of occurrences of the digit 1: the electronic journal of combinatorics 28(3) (2021), #P3.12 The Thue-Morse set T is 2-recognizable since the language val −1 2 (T ) is accepted by the automaton depicted in Figure 1. More precisely, the Thue-Morse set T is 2 p -recognizable for all p ∈ N 1 and is not b-recognizable for any other base b. This is a consequence of the famous theorem of Cobham. Two positive integers are said to be multiplicatively independent if their only common integer power is 1 and are said to be multiplicatively dependent otherwise.
• Let b, b be two multiplicatively independent bases. Then a subset of N is both brecognizable and b -recognizable if and only if it is a finite union of arithmetic progressions.
• Let b, b be two multiplicatively dependent bases. Then a subset of N is b-recognizable if and only if it is b -recognizable.
In the case of the Thue-Morse set, it is easily seen that, for each p ∈ N 1 , the language val −1 2 p (T ) is accepted by the DFA ({T, B}, T, T, A 2 p , δ) where for all X ∈ {T, B} and all a ∈ A 2 p , where T = B and B = T . For example this automaton is depicted in Figure 2 for p = 2.
In order to avoid a systematic case separation, we introduce the following notation: for X ∈ {T, B} and n ∈ N, we define X n = X if n ∈ T X otherwise.
With this notation, we can simply rewrite the definition of the transition function δ as δ(X, a) = X a . The following proposition is well known; for example see [11]. Proposition 2. Let b ∈ N 2 and m, t ∈ N. If X is b-recognizable, then so is mX + t.
Otherwise stated, the dilation n → mn and translation n → n+t preserve b-recognizability.
In particular, for any m, t ∈ N and p ∈ N 1 , the set mT + t is 2 p -recognizable. The aim of this work is to show the following result.
Remark that the state complexity of mT + r (with respect to the base 2 p ) is independent of the value of the remainder r.
Our proof of Theorem 3 is constructive. In order to describe the minimal DFA of val −1 2 p (mT + r), we will successively construct several automata. First, we build a DFA A T ,2 p accepting the language val −1 2 p (T × N). Then we build a DFA A m,r,b accepting the language Note that we do the latter step for any integer base b and not only for powers of 2. Next, we consider the product automaton A m,r,2 p × A T ,2 p . This DFA accepts the language Finally, a finite automaton Π(A m,r,2 p × A T ,2 p ) accepting val −1 2 p (mT + r) is obtained by projecting the label of each transition in A m,r,2 p × A T ,2 p onto its second component. At each step of our construction, we check that the automaton under consideration is minimal (and hence deterministic) and the ultimate step precisely consists in a minimization procedure.
From now on, we fix some positive integers m, p and some remainder r ∈ [[0, m−1]]. We also let z and k be the unique integers such that m = k2 z with k odd. Finally we let R = |rep 2 p (r)|.

Construction of the intermediate automata
. This DFA is a modified version of the automaton accepting val −1 2 p (T ) defined in the previous section. Namely, we replace each transition labeled by a ∈ A 2 p by 2 p copies of itself labeled by (a, b), for each b ∈ A 2 p . Formally, where, for all X ∈ {T, B} and all a, b ∈ A 2 p , we have δ T ,2 p (X, (a, b)) = X a . For example, the automata A T ,2 and A T ,4 are depicted in Figure 3.

Proof. We do the proof by induction on
We suppose that the result is satisfied for (u, v) and we show that it is also true for (ua, vb). Let Y = δ T ,2 p (X, (u, v)). By the induction hypothesis, we have Y = X val 2 p (u) . Thus we obtain where we have used Lemma 4 for the last step.

The automaton A m,r,b
In this section, we consider an arbitrary integer base b. Let where the (partial) transition function δ m,r,b is defined as follows: We refer the interested reader to [38]. For example, the automaton A 6,2,4 is depicted in Figure 4. Note that the automaton A m,r,b is not complete (see Remark 6). Also note that there is always a loop labeled by (0, 0) on the initial state 0.
such that δ m,r,b (i, (d, e)) = j. Indeed, d and j are unique since they are the quotient and remainder of the Euclidean division of bi + e by m. We still have to check that d < b. We have Since i m−1, j 0 and e < b, we have Proof. We do the proof by induction on n = |(u, v)|. If n is equal to 0, the result is clear.
We suppose that the result is satisfied for (u, v) and we show that it is also true for (du, ev). We use the notation DIV(x, y) and MOD(x, y) to designate the quotient and the remainder of the Euclidean division of x by y (thus, we have DIV(x, y) = x y ). By definition of the transition function, we have By using the induction hypothesis, we have To be able to conclude the proof, we still have to show that implies d = DIV(bi + e, m). Thus, suppose that (1) is true. Then as desired.
Remark 8. It is easily checked that Remark 6 extends from letters to words: In particular, the word u must have the same length as the word v, and hence val b (u) < b |v| .

The projected automaton Π(A m,r,b )
In this section again, b is an arbitrary integer base. We consider the automaton obtained by projecting the label of each transition of A m,r,b onto its second component. We denote by Π(A m,r,b ) the automaton obtained thanks to this projection. Thanks to Remark 6, the automaton Π(A m,r,b ) is deterministic and complete. We denote by δ Π m,r,b the corresponding transition function. For example, the automaton Π(A 6,2,4 ) is depicted in Figure 5. Note that the automaton Π(A m,r,b ) actually corresponds to the "classical" construction of an automaton recognizing mN + r in base b, see for example [1].
Hence |rep b ( )| |v| and the word u = 0 |v|−|rep b ( )| rep b ( ) has length |v| and is such that val b (u) = . The conclusion follows from Lemma 7.

The product automaton
In this section, we study the product automaton A m,r,2 p ×A T ,2 p . Since the states of A m,r,2 p are numbered from 0 to m−1 and those of A T ,2 p are T and B, we denote the states of the product automaton by We denote by δ × the (partial) transition function of this product automaton. The initial state is (0, T ) and the only final state is (r, T ).
Proof. It suffices to combine Lemmas 5 and 7.
In Figure 6, we have depicted the automaton A 6,2,4 × A T ,4 , as well as the automata A 6,2,4 and A T ,4 , which we have placed in such a way that the labels of the product automaton can be easily deduced. Here and in the next figures, states are named iX instead of (i, X) for clarity.

The projection
Now, we provide a DFA accepting the language val −1 2 p (mT + r). This automaton is denoted by Π (A m,r,2 p × A T ,2 p ) and is defined from the automaton A m,r,2 p × A T ,2 p by only keeping the second component of the label of each transition. Formally, the states of Π (A m,r,2 p × A T ,2 p ) are We denote by δ Π × the (partial) transition function of this product automaton.  Figure 7, all edges labeled by 0 (1, 2 and 3 respectively) are represented in black (blue, red and green respectively). Proof. We have δ Π × ((i, X), v) = (j, Y ) if and only if there exists some word u over A 2 p of the same length as v such that δ × ((i, X), (u, v)) = (j, Y ). Take = val 2 p (u). The conclusion follows from Lemma 10 and a similar argument as in the proof of Lemma 9.
Remark 13. Note that in the statement of Lemma 12, the integer is necessarily less than 2 p|v| . This is due to the fact that if v is the label of some path in the projected automaton Π (A m,r,2 p × A T ,2 p ), there must be a word u of the same length as v such that the pair (u, v) is the label of a path in the automaton A m,r,2 p ×A T ,2 p . This can be deduced directly from the computation:

Properties of the intermediate automata
Now we prove some properties of the automata A T ,2 p , A m,r,b , Π(A m,r,b ), A T ,2 p × A m,r,b and Π(A T ,2 p × A m,r,b ) that will be useful for our concerns.

Properties of
Proof. This directly follows from Lemma 5. In what follows, we let K = |rep 2 p (k−1)2 z |, provided that k > 1. Then we define a permutation σ of the integers in [[0, k−1]] by σ(i) = −2 pK−z i mod k. Note that σ permutes the integers 0, 1, . . . , k − 1 because k is odd. Further, we define w i to be the unique word of length K representing σ(i)2 z in base 2 p : Note that the words w i are well defined since, by the choice of K, Lemma 16. If k > 1 then pK z.
Proof. We have Thus pK p z p z. Proof. The word w i has length K and from Lemma 16, we know that pK z. By Lemma 9, we have The result follows from the definition of σ. Proof. Let ∈ N and, for each i ∈ [[0, k−1]], let y i = w i (rep 2 p (m)) rep 2 p (r). Recall that we have set R = |rep 2 p (r)|. Further, set M = |rep 2 p (m)|. Then |y i | = K + M + R and from Lemma 16, we know that pK z. Therefore, we have The conclusion follows from Lemma 9. , and hence, the concatenation w i rep 2 p (r) leads from i to r. This shows that Π(A m,r,2 p ) is also coaccessible. In order to see that it is complete, observe that for every state i ∈ [[0, m−1]] and every digit e ∈ A 2 p , there is a transition labeled by e from i to the state 2 p i + e mod m (see Remark 6).
In [14], we proved the coaccessibility for arbitrary integer bases b by using a different method. In the present case of a base which is a power of two, the argument is more straightforward since we are able to explicitly provide a word that is accepted from any state i.
The automaton Π(A m,r,b ) is not minimal in general: it is minimal if and only if m and b are coprime; see for example [1].
Proposition 20. The automaton A m,r,2 p is accessible, coaccessible and has disjoint states.
Proof. It directly follows from Proposition 19 that A m,r,2 p is accessible and coaccessible. Now, let i, j ∈ [[0, m−1]] and let (u, v) ∈ L i ∩ L j . By Lemma 7, we have which implies that i = j. This proves that A m,r,2 p has disjoint states.
In a reduced DFA, there can be at most one non-coaccessible state. Thus, we deduce from Proposition 20 that A m,r,2 p is indeed the trim minimal automaton of the language val −1 2 p {(n, mn + r) : n ∈ N} , that is the automaton obtained by removing the only noncoaccessible state from its minimal automaton.

Properties of
Proof. This directly follows from Lemma 14. ] and X ∈ {T, B}. By Lemma 17, we know that there exists a word w i that leads from the state i to the state 0 in the automaton Π(A m,r,2 p ). Thus, there exists a word u of the same length as w i such that the word (u, w i ) leads from i to 0 in A m,r,2 p . Now, by reading (u, w i ) from (i, X) in A m,r,2 p × A T ,2 p , we reach either the state (0, T ) or the state (0, B). If we reach (0, T ), then the concatenation (u, w i )rep b (0, r) leads from the state (i, X) to (r, T ) in A m,r,2 p × A T ,2 p . If we reach (0, B) instead, then we may apply Lemma 22 in order to obtain that the concatenation (u, w i )rep b (1, m)rep b (0, r) leads from (i, X) to (r, T ) in A m,r,2 p × A T ,2 p . This proves that A m,r,2 p × A T ,2 p is coaccessible.
The fact that A m,r,2 p ×A T ,2 p has disjoint states follows from Propositions 15 and 20.

Properties of Π(A m,r,2 p × A T ,2 p )
Proof. This is a direct verification. Proof. Proceed by contradiction and suppose that a word v over A 2 p is accepted from both (i, T ) and (i, B) in Π (A m,r,2 p × A T ,2 p ) for some i ∈ [[0, m−1]]. Then there exists words u and u over A 2 p of length |v| such that, in A m,r,2 p × A T ,2 p , the words (u, v) and (u , v) are accepted from (i, T ) and (i, B) respectively. But from Remark 8, we must have u = u . Hence the word (u, v) is accepted from both (i, T ) and (i, B) in A m,r,2 p × A T ,2 p , contradicting that this automaton has disjoint states (see Proposition 23).
Proof. By construction, Π (A m,r,2 p × A T ,2 p ) accepts val −1 2 p (mT +r); see Section 3. The fact that this automaton is deterministic and complete follows from Remark 6. It is accessible and coaccessible because A m,r,2 p × A T ,2 p is.

Minimization of Π (A m,r,2 p × A T ,2 p )
We start by defining some classes of states of Π (A m,r,2 p × A T ,2 p ). Our aim is twofold. First, we will prove that those classes consist in indistinguishable states, i.e. accepting the same language. Second, we will show that states belonging to different classes are distinguishable, i.e. accept different languages. Otherwise stated, these classes correspond to the left quotients w −1 L where w is any finite word over the alphabet A 2 p and L = val −1 2 p (mT + r).

Definition of the classes
Recall that R = |rep 2 p (r)|. We define N = max{ z p , R}. The classes we are going to define are closely related to the base 2 p -expansion of the remainder r with some additional leading zeroes. More precisely, we have to consider the word 0 N −R rep 2 p (r), which is the unique word over the alphabet A 2 p with length N and 2 p -value r. This word is equal to the 2 p -expansion rep 2 p (r) if and only if N = R, i.e. z p R.
Note that in the case where z is divisible by p, the two cases of the definition coincide for the value α = z p . Let us comment the previous definition, which may seem quite technical at first. The first elements of the sets C α are the integer part of the remainder r divided by increasing powers of the base 2 p , i.e. r divided by 2 pα for the set indexed by α. The further elements of a set C α are obtained by adding to the first element r 2 pα integer multiples of m 2 pα so that the greatest element so-obtained is still less than m, provided that m is divisible by this power 2 pα . When m is no longer divisible by 2 pα , i.e. when α > z p , then we add integer multiples of k, which is the odd part of m. In particular, if m is odd, i.e. if z = 0, then all the sets C α are reduced to a single state: C α = {( r 2 pα , T )}. Finally, note that since R = |rep 2 p (r)| and N R, we have r 2 pN = 0 and C N = {(k , T ) : 0 2 z −1}. We will see in Lemma 42 that the states in C α are exactly those from which there is a path labeled by the suffix of length α of 0 N −R rep 2 p (r) to the state (r, T ). The sets C α are not necessarily disjoint as Example 28 shows. In order to obtain the desired classes of states of the automaton Π (A m,r,2 p × A T ,2 p ), we consider the following definition.
Let us define a second type of classes. The idea behind this definition is that these classes are "too far" from the remainder r with respect to the division by consecutive powers of the base 2 p , in the sense that these states do not accept any suffix of 0 N −R rep 2 p (r).
As we already observed, all states of the form ( k, T ) appear in the set C N and thus, also in the union of the sets C α . This is the reason why the sets D (0,T ) and D (0,T ) are not defined, i.e. (j, X) = (0, T ) in the previous definition.
We will refer to the sets of states C α and D (j,X) as classes of states. Let us make some preliminary observations concerning the previous definitions.
The classes C α and D (j,X) are pairwise disjoint: the intersection of any two such classes is empty. Moreover, the nonempty classes Note that 2k + z p = 2 · 3 + 3 2 = 8 of them are nonempty. In Figure 8, the automaton Π (A 24,23,4 × A T ,4 ) is represented without the transitions and the states are colored with respect to these classes. If now we consider r = 0, these classes are  In this case, they are all nonempty and there are 8 of them. In Figure 9, the states of the automaton Π (A 24,0,4 × A T ,4 ) are colored with respect to these classes. Our aim is to prove that the nonempty classes defined above correspond exactly to the left quotients of the language val −1 2 p (mT + r). In particular, note that α = R is the first value for which r 2 pα = 0. Therefore ( r 2 pα , T 0 ) = ( r 2 pα , T ) ∈ C α for every α ∈ [[0, R]]; see Figure 10. then we are done (in this case, the part of Figure 10 below the line is empty).
First, suppose that pα z. Then we have to prove that ( m 2 pα , T 1 ) = ( m 2 pα , B) / ∈ C β for β < α. If α > β then m 2 pα < m 2 pβ r 2 pβ + m 2 pβ . This shows that if the state ( m 2 pα , B) belongs to some C β with β < α, then its first component has to be r 2 pβ . But since it has second component T 1 = B and since T 0 = T , the state ( m 2 pα , B) cannot be the first state of any set C β .
Second, suppose that pα > z. In this case, we have to prove that (k, B) / ∈ C β for β < α. Similarly to what precedes, if (k, B) belongs to some set C β , then we must have r 2 pβ = 0 and β z p . But since z p < α N = z p , we get α = z p . Therefore, any β < α is such that pβ < z. Now suppose that pα < z. We show that the second element (j + k, B) of the set D (j,T ) does not belong to any set C β , and hence indeed belongs to the class D (j,T ) . Let β ∈ [[0, N ]] and suppose to the contrary that (j + k, B) ∈ C β . Since j + k ∈ [[0, 2k − 1]], the state (j +k, B) must be either the first or the second element of the set C β . But since B = T 0 = T , it has to be the second. If pβ < z, then we obtain j + k = r 2 pβ + k2 z−pβ 2k, a contradiction. Thus pβ z and j +k = r 2 pβ +k. But this implies that r 2 pα = j = r 2 pβ . Since β > α, this means that j = 0, a contradiction.

States of the same class are indistinguishable
In order to prove that two states (j, X) and (j , X ) of the automaton Π (A m,r,2 p × A T ,2 p ) are indistinguishable, we have to prove that L (j,X) = L (j ,X ) . The general procedure that we use for proving that L (j,X) = L (j ,X ) goes as follows. Pick some word v ∈ A * 2 p and let n = |v| and e = val 2 p (v). By Lemma 12, the word v is accepted from the state (j, X) if and only if there exists some d ∈ N such that 2 pn j + e = md + r and X d = T.
Similarly, the word v is accepted from the state (j , X ) if and only if there exists some d ∈ N such that 2 pn j + e = md + r and (X ) d = T.
But then, observe that there is only one possible pair of candidates for d and d : we necessarily have Therefore, proving that is equivalent to proving that for all n ∈ N and e ∈ [[0, 2 pn −1]], we have where d and d are given by (3). Moreover, note such d and d are always greater than or equal to − r m , hence they are greater than −1. Thus, provided that d and d are integers, we know that they are necessary nonnegative. Similarly, thanks to Remark 13, d and d must be less than 2 pn . For these reasons, in the forthcoming proofs (namely, in Lemmas 36 and 38), we need to verify that d, d ∈ Z but we don't need to check that 0 d, d < 2 pn .
Our first aim is to show that all states in the same class D (j,X) accept the same language. We start with a lemma that will be used several times. Note that this lemma does not only concern the classes D (j,X) since we can have (j, X) = (0, T ) in the statement.
Following the procedure described above, we have to prove that (d ∈ N and X d = T ) ⇐⇒ (d ∈ N and (X ) d = T ). Since d = d + 2 pn k m = d + 2 pn−z and since pn z, d is an integer if and only if so is d . Moreover, ]. Therefore, if pn < z then we get that which contradicts our assumption. So pn z and the conclusion follows from Lemma 36.
Note that the proof of Proposition 37 shows that no word shorter than z p is accepted from a state of a class D (j,X) . However, such words may be accepted from a state of one of the classes C α (see Lemma 42 below). Now we turn to the classes C α . The proof is divided in several technical lemmas. Proof. Suppose that N = z p and that v is a word over A 2 p of length β < N that is accepted from a state of the form ( r 2 pα + k, T ) with ∈ [[0, 2 z − 1]], i.e. from a state in C N . Set e = val 2 p (v) and d = 2 pβ ( r 2 pα + k) + e − r m .
Then d ∈ N and (T ) d = T . We get and T = T d . Since β < z p , this shows that the state ( r 2 pα + k, T ) = ( r 2 pβ + d m 2 pβ , T d ) belongs to C β , and hence cannot belong to C N .
We are now ready to prove that two states belonging to any given class C α are indistinguishable.
Proof. Let α ∈ [[0, N ]]. From Lemma 38, it is enough to consider words of length smaller than α and from the first item of Lemma 39, we may suppose that pα > z. If N = z p , then we must have α = N and we are done thanks to Lemma 40. Thus, we may also assume that N = R > z p . Under these assumptions, r 2 pα < k and the first state of C α is ( r 2 pα , T ); see Figure 10. Thus, we have to show that for all ∈ [[0, 2 z − 1]] such that the state ( r 2 pα + k, T ) indeed belongs to C α and all n < α, we have If pn < z then both languages are empty by the second item of Lemma 39. If pn z then the equality follows from Lemma 36.

States of different classes are distinguishable
In this section, we show that, in the projected automaton Π (A m,r,2 p × A T ,2 p ), states belonging to different classes C α or D (j,X) are pairwise distinguishable, that is, for any two such states, there exists a word which is accepted from exactly one of them.
The following lemma shows that the states in a set C α are exactly those states that leads to states in C α−1 by reading the letter r α−1 , where 0 N −R rep 2 p (r) = r N −1 · · · r 1 r 0 .
It remains to show that the nonempty classes D (j,X) are distinguishable from each other.
Proof. We already know from the previous section that the states of D (i,X) (resp. D (j,Y ) ) are indistinguishable. Therefore, it suffices to show that L (i,X) = L (j,Y ) .
First, suppose that i = j. Then X = Y by hypothesis and the states (i, X) and (j, Y ) are disjoint by Lemma 25. Since Π (A m,r,2 p × A T ,2 p ) is coaccessible by Proposition 26, we obtain that the states (i, X) and (j, Y ) are distinguishable. Now suppose that i = j. By Lemma 18, the word w i rep 2 p (r) is accepted from i in the automaton Π(A m,r,2 p ) but is not accepted from j. Then, there exist a word u 1 of length |w i | and a word u 2 of length R such that the word (u 1 , w i )(u 2 , rep 2 p (r)) is accepted from i in the automaton A m,r,2 p but is not accepted from j. By Lemma 21, this word is accepted either from (i, T ) or from (i, B) in the automaton A m,r,2 p × A T ,2 p but is not accepted neither from (j, T ) nor from (j, B). Now, two cases are possible.
First, suppose that (u 1 , w i )(u 2 , rep 2 p (r)) is accepted from (i, X) in A m,r,2 p × A T ,2 p . Then, in the projection Π (A m,r,2 p × A T ,2 p ), the word w i rep 2 p (r) is accepted from (i, X) but not from (j, Y ). Thus, the word w i rep 2 p (r) distinguishes the states (i, X) and (j, Y ).
This shows that the word w i rep 2 p (m)rep 2 p (r) is accepted from (i, X) in Π A m,r,2 p ×A T ,2 p . From Lemmas 18 and 24, this word cannot be accepted from (j, Y ), hence it distinguishes the states (i, X) and (j, Y ).

6.5
The minimal automaton of val −1 2 p (mT + r). We are ready to construct the minimal automaton of val −1 2 p (mT + r). Since the states of Π (A m,r,2 p × A T ,2 p ) that belong to the same class C α or D (j,X) are indistinguishable, they can be glued together in order to define a new automaton M m,r,T ,2 p that still accepts the same language.
The formal definition of M m,r,T ,2 p is as follows. The alphabet is A 2 p . The states are the classes C α for α ∈ [[0, N ]] and the nonempty classes D (j,X) for (j, X) ∈ [[0, k−1]] × {T, B} \ {(0, T )}. The class C R is the initial state and the only final state is the class C 0 . Note that (0, T ) ∈ C R and that (r, T ) ∈ C 0 . The transitions of M m,r,T ,2 p are defined as follows: there is a transition labeled by a letter a in A 2 p from a class J 1 to a class J 2 if and only if in the automaton Π (A m,r,2 p × A T ,2 p ), there is a transition labeled by a from a state of J 1 to a state of J 2 .

0T
1T  Proof. By construction and by Propositions 37 and 41, the language accepted by M m,r,T ,2 p is val −1 2 p (mT + r). In order to see that M m,r,T ,2 p is minimal, it suffices to prove that it is complete, reduced and accessible. The fact that M m,r,T ,2 p is reduced follows from Propositions 43 and 44. We know from Proposition 26 that the automaton Π (A m,r,2 p × A T ,2 p ) is complete and accessible, which in turn implies that M m,r,T ,2 p is complete and accessible as well.
We are now ready to prove Theorem 3. Example 47. The minimal automaton of the language val −1 4 (6T + 2) has 7 states; see Figure 12. We can indeed compute that 2 · 3 + 1 2 = 7. As already mentioned, the obtained state complexity of mT + r is independent of the considered remainder r. This observation is all the more surprising that the classes used in our construction indeed depend on r.
7 A direct description of the classes whenever r = 0 In the conference paper [14], we described the automaton M m,r,T ,2 p in the particular case where r = 0, i.e. for the exact multiples of T . The construction was similar, but the way we built the classes of states was different. Therefore, we can give another description of the classes C α and D (j,X) for r = 0 which is easier than the descriptions from Definitions 29 and 30 in the sense that the classes are built in a direct way, without having to remove some states a posteriori.
Note that if r = 0 then R = 0 and N = z p . Corollary 48. Suppose that r = 0.

Replacing T by its complement T
If we are interested in the set T = N \ T instead of T , we can use the same construction that we described and studied for T . We only have to exchange the final/non-final status of the states in the automaton A T . In this section, we show that we may instead directly obtain the minimal automaton of the language val −1 2 p (mT + r) from that of val −1 2 p (mT + r).
Example 49. Let us push further our running example by considering now T instead of T . The classes of states are defined similarly by exchanging T and B everywhere. In Figure 13, we have depicted the classes of the corresponding projected product automaton, which we denote by Π A 6,2,4 × A T ,4 . Figure  the minimal automaton obtained by gluing the sets of the same classes together is not a symmetric version of the automaton M m,r,T ,2 p we obtained starting from the set T ; compare Figures 12 and 14. Nevertheless, observe that the automaton of Figure 14 can be obtained from the one of Figure 12 by replacing the initial state (in purple) by the yellow state. Also observe that, in the automaton of Figure 12, the yellow state is reached from the initial state by reading the word rep 4 (6) = 12. This fact is always true and is proved in Proposition 50.
In the next proposition, we show that the minimal automaton of val −1 (mT + r) can be obtained directly from the minimal automaton of val −1 2 p (mT + r) by only moving the initial state.
Proposition 50. The minimal automaton of val −1 2 p (mT + r) is obtained by replacing the initial state of the automaton M m,r,T ,2 p by the state that is reached by reading rep 2 p (m) from the initial state.
Proof. Consider the automaton M m,r,T ,2 p . By construction, its states are sets of states (called classes) of the automaton Π (A m,r,2 p × A T ,2 p ). By Lemma 12, for each X ∈ {T, B}, there is a path labeled by rep 2 p (m) going from (0, X) to (0, X) in Π (A m,r,2 p × A T ,2 p ), and hence the same holds for the corresponding classes of states in M m,r,T ,2 p .
First, let us show that the obtained automaton is again minimal. By only changing the initial state of any minimal DFA, we keep a DFA that is complete and reduced. Furthermore, the obtained DFA is still accessible since we have seen in the previous paragraph that there is a path from the class of (0, B) to the class of (0, T ), which is precisely the initial state in M m,r,T ,2 p .
It remains to show that the language L accepted from the class of (0, B) in the automaton M m,r,T ,2 p is equal to val −1 2 p (mT + r). By construction, L is equal to the language L (0,B) accepted from the state (0, B) in the automaton Π (A m,r,2 p × A T ,2 p ) and we already know that L (0,T ) = val −1 2 p (mT + r). Let w ∈ A 2 p . We know that w ∈ L (0,B) ⇐⇒ rep 2 p (m)w ∈ L (0,T ) . Thus, it is sufficient to prove that val 2 p (w) ∈ mT + r ⇐⇒ m2 p|w| + val 2 p (w) ∈ mT + r. In both cases, we must have that val 2 p (w) = mq + r with q ∈ N. Since q val 2 p (w) < 2 p|w| , we have rep 2 (2 p|w| + q) = 10 p|w|−|rep 2 (q)| rep 2 (q). This shows that q ∈ T ⇐⇒ 2 p|w| + q ∈ T , hence the conclusion.
Corollary 51. Let m, p be positive integers and r ∈ [[0, m−1]]. Then the state complexity of mT + r with respect to the base 2 p is equal to 2k + z p if m = k2 z with k odd.

Conclusion and perspectives
Our method is constructive and in principle, it may be applied to any b-recognizable set X ⊆ N. However, in general, it is not the case that the product automaton A m,r,2 p ×A X,2 p recognizing the bidimensional set {(n, mn + r) : n ∈ X} is minimal. As an example, consider the 2-recognizable set X of powers of 2: X = {2 n : n ∈ N}. The product automaton A 3,0,2 × A X,2 of our construction (for m = 3, r = 0 and b = 2) has 6 states but is clearly not minimal since it is easily checked that the automaton of Figure 15 is the trim minimal automaton recognizing the set {(2 n , 3 · 2 n ) : n ∈ N}. This illustrates (0, 0) (0, 0) (0, 1) (1, 1) that, in general, the minimization procedure is not only needed in the final projection Π (A m,r,2 p × A X,2 p ) as is the case in the present work. Nevertheless, we conjecture that the phenomenon described in this work for the Thue-Morse set also appears for all b-recognizable sets of the form X b,c,M,R = {n ∈ N : |rep b (n)| c ≡ R mod M } where b is an integer base, c is any digit in A b , M is an integer greater than or equal to 2 and R is any possible remainder in [[0, M −1]]. More precisely, we conjecture that whenever the base b is a prime power, i.e. b = q p for some prime q, then the state complexity of mX b,c,M,R + r is given by the formula M k + z p where k is the part of the multiple m that is prime to the base b, i.e. m = kq z with gcd(k, q) = 1. Note that the set T is of this form: T = {n ∈ N : |rep 2 (n)| 1 ≡ 0 mod 2}.
Another potential future research direction in the continuation of the present work is to consider automata reading the expansions of numbers with least significant digit first. Both reading directions are relevant to different problems. For example, it is easier to compute addition thanks to an automaton reading expansions from "right to left" than from "left to right". On the opposite, if we have in mind to generalize our problems to b-recognizable sets of real numbers (see for instance [9,13,15]), then the relevant reading direction is the one with most significant digit first. Further, there is no intrinsic reason why the state complexity from "left to right" should be the same as (or even close to) that obtained from "right to left" since in general, it is well known that the state complexity of an arbitrary language can greatly differ from that of its reversed language, see for example [29].