On extended boundary sequences of morphic and Sturmian words

Generalizing the notion of the boundary sequence introduced by Chen and Wen, the $n$th term of the $\ell$-boundary sequence of an infinite word is the finite set of pairs $(u,v)$ of prefixes and suffixes of length $\ell$ appearing in factors $uyv$ of length $n+\ell$ ($n\ge \ell\ge 1$). Otherwise stated, for increasing values of $n$, one looks for all pairs of factors of length $\ell$ separated by $n-\ell$ symbols. For the large class of addable abstract numeration systems $S$, we show that if an infinite word is $S$-automatic, then the same holds for its $\ell$-boundary sequence. In particular, they are both morphic (or generated by an HD0L system). To precise the limits of this result, we discuss examples of non-addable numeration systems and $S$-automatic words for which the boundary sequence is nevertheless $S$-automatic and conversely, $S$-automatic words with a boundary sequence that is not $S$-automatic. In the second part of the paper, we study the $\ell$-boundary sequence of a Sturmian word. We show that it is obtained through a sliding block code from the characteristic Sturmian word of the same slope. We also show that it is the image under a morphism of some other characteristic Sturmian word.


Introduction
Let x be an infinite word, i.e., a sequence of letters belonging to a finite alphabet.Imagine a window of size n moving along x.Such a reading frame permits to detect all factors of length n occurring in x.For instance, the factor complexity function of x mapping n ∈ N to the number of distinct factors of length n is extensively studied in combinatorics on words.Now let n, be such that n ≥ .Assume that within the sliding window, we only focus on its first and last symbols.Otherwise stated, for a factor uyv of length n, we only consider its borders u and v of length .For any given window length n, we would like to determine what are the pairs of lengthborders that may occur.This leads to the following definition, where, to simplify notation, we consider borders of factors of length n + rather than n.Definition 1.1.Let ∈ N >0 and x ∈ A N .For n ≥ , we define the nth boundary set by ∂ x, [n] := {(u, v) ∈ A × A | uyv is a factor of x for some y ∈ A n− } and call the sequence ∂ x, := (∂ x, [n]) n≥ the -boundary sequence of x.When = 1, we write ∂ x,1 = ∂ x and simply talk about the boundary sequence.

Motivation and related work
In combinatorics on words, borders and boundary sets are related to important concepts.For instance, a word v is bordered if there exist u, x, y such that v = ux = yu and 0 < |u| < |v|.One reason to study bordered words is Duval's theorem: for a sufficiently long word v, the maximum length of unbordered factors of v is equal to the period of v [19].In formal language theory, a language L is locally -testable (LT) if the membership of a word w in L only depends on the prefix, suffix and factors of length of w.In [43], the authors consider the so-called separating problem of languages by LT languages; they utilize -profiles of a word, which can again be related to boundary sets.Let us also mention that, in bioinformatics and computational biology, one of the aims is to reconstruct sequences from subsequences [33].To determine DNA segments by bottom-up analysis, paired-end sequencing is used.In this case both ends of DNA fragments of known length are sequenced.See, for instance, [24].This is quite similar to the theoretical concept we discuss here.
The notion of a (1-)boundary sequence was introduced by Chen and Wen in [12] and was further studied in [25], where it is shown that the boundary sequence of a k-automatic word (in the sense of Allouche and Shallit [2]: see Definition 2.7) is k-automatic.It is well known that a k-automatic word x is morphic, i.e., there exist morphisms f : A → A * and g : A → B and a letter a ∈ A such that x = g(f ω (a)), where f ω (a) = lim n→∞ f n (a).However, k-automatic words (with k ranging over the integers) do not capture all morphic words: a well-known characterization of k-automatic words is given by Cobham [13] (the generating morphism f maps each letter to a length-k word).This paper is driven by the natural question whether, in general, the -boundary sequence of a morphic word is morphic.In case such generating morphisms can be constructed, we have at our disposal a simple algorithm providing the set of length-borders in factors of all lengths.
We briefly present several situations in which the notion of boundary sets is explicitly or implicitly used.In [16,Thm. 4], the authors study the boundary sequence to exhibit a squarefree word for which each subsequence arising from an arithmetic progression contains a square.Boundary sets play an important role in the study of so-called k-abelian and k-binomial complexities of infinite words (for definitions, see [47]).For instance, computing the 2-binomial complexity of generalized Thue-Morse words [32] requires inspecting pairs of prefixes and suffixes of factors, which is again related to the boundary sequence when these prefixes and suffixes have equal length.The k-binomial complexities of images of binary words under powers of the Thue-Morse morphism are studied in [49]; there some general properties of boundary sequences of binary words are required (see [49,Lem. 4.6]).Moreover, if ∂ x is automatic, then the abelian complexity of the image of x under a so-called Parikh-constant morphism is automatic [12].Guo, Lü, and Wen combine this result with theirs in [25] to establish a large family of infinite words with automatic abelian complexity.
Let k ≥ 1.We let ≡ k denote the k-abelian equivalence, i.e., u ≡ k v if the words u and v share the same set of factors of length at most k with the same multiplicities [28].For u and v equal length factors of a Sturmian word s, we have u ≡ k v if and only if they share a common prefix and a common suffix of length min{|u|, k − 1} and u ≡ 1 v [28, Prop.2.8].Under the assumption that the largest power of a letter appearing in s is less than 2k − 2, the requirement u ≡ 1 v in the previous result may be omitted [41,Thm.

Our contributions
Up to our knowledge, we are the first to propose a systematic study of the -boundary sequences of infinite words.It is therefore natural to consider the notion on well-known classes of words.In this paper, we consider morphic words and Sturmian words.
Any morphic word is S-automatic for some abstract numeration system S [48].With Theorem 3.1, we prove that for a large class of numeration systems S, if x is an S-automatic word, then the boundary sequence ∂ x is again S-automatic.Our approach generalizes the arguments provided by [25].Considering exotic numeration systems allows a better understanding of underlying mechanisms, which do not arise in the ordinary integer base systems.In particular, we deal with addition within the numeration system; in integer base systems, the carry propagation is easy to handle (by a two-state finite automaton).Our arguments apply to so-called addable numeration systems for which the graph of addition is regular (see Definition 2.4 for details).
As an alternative, we observe that a classical effective procedure (Theorem 2.11) transforming formulae to automata can be extended to addable abstract numeration systems S. The S-automaticity of the -boundary sequence then follows from the fact that it is definable by a first-order formula of the structure N, + extended with comparisons and indexing into an S-automatic sequence.
This alternative proof however hides the important details that might help identifying the technical limits of the result: not all morphic words allow an addable system to work with.However, the finiteness of a suitable kernel captures all morphic words (see Theorem 2.9).To identify the contours of our result, we also discuss the case where x is S-automatic and ∂ x is not S-automatic.To construct such examples, we have to consider non-addable numeration systems in Section 3.3.
We then turn to the other class of words under study.Letting s be a Sturmian word with slope α, with Theorem 4.1 we show that the -boundary sequence of s is obtained through a sliding block code from the characteristic Sturmian word of slope α (see Section 4 for a definition) up to the first letter.This result holds even for non-morphic Sturmian words, so for an arbitrary irrational α.Where the techniques used in the first part of the paper have an automata-theoretic flavor, the second part relies on the geometric characterization of Sturmian words as codings of rotations.We provide another description of the -boundary sequence of a Sturmian word as the morphic image of some characteristic Sturmian word in Proposition 4.10.
This paper is a long version of [50] presented at MFCS 2022.It contains many proofs (omitted due to space limitation) and, in particular, discussions about Sturmian words.This extended version includes work through examples using Walnut.In Section 2.3 we explicitly compute the 2-boundary sequence of the Thue-Morse and Fibonacci words, see Examples 2.12 and 2.13.In Section 3.2, we present several examples of automatic sequences built on intrinsically non-addable numeration systems for which the boundary sequence is still automatic, see Propositions 3.4 and 3.8.Finally, the proof of Theorem 3.1 has been strengthened to a larger setting to include addable abstract numeration systems.This slightly broadens the presentation of the paper which is not limited to positional numeration systems anymore.

Preliminaries
Throughout this paper we let A denote a finite alphabet.Then A n denotes the set of length-n words and A N denotes the set of infinite words.Infinite words will usually, but not always, be indexed starting from 0. They will also be written in bold.For a finite word u, we let u ω denote the concatenation of infinitely many copies of the word u, i.e., u ω = uuu • • • .For two words u, v for which w = uv, we let wv −1 denote the prefix u and u −1 w the suffix v.For a finite or infinite word x, we let x[n] denote the letter at index n (assuming it is well-defined for this value of n, e.g., if x is a -boundary sequence, n ≥ ).Similarly, for m ≥ n we set x[n, m] : For any integer n ≥ 0, we let Fac n (x) denote the set of length-n factors of x; we write Fac(x) = ∪ n≥0 Fac n (x).A factor u of an infinite word x ∈ A N is called right special if there exist distinct letters a, b ∈ A such that ua, ub ∈ Fac(x).We note that an infinite word x is aperiodic if and only if it has a right special factor for each length.For general references on numeration systems, see [22] and [7,.We assume that the reader has some knowledge in automata theory.For a reference see [52] or [46,Chap. 1].

Basic properties of boundary sequences
Recall that in our definition of the boundary sequence, we inspect factors of length n + with n ≥ .This implies that the prefix and suffix of length forming the boundary pair do not overlap.The following observation justifies this choice in a sense.Proof.Fix an integer m with 0 ≤ m < .We show that ∂ x, [m] = ∂ x, [n] for any n > m.The claim follows straightforwardly from this observation.We first observe that any boundary pair is right special and x = x n+ −m (here < n+ −m so such a choice can be made).Now x defines the boundary pair which shows that this pair cannot appear in ∂ x, [m].This concludes the proof.
The above proposition is tight in the sense that there exist aperiodic words for which the boundary set ∂ x, [ ] appears infinitely often in the boundary sequence ∂ x, .This can be seen, e.g., from Proposition 4.8.Another quick example for this is the Champernowne word c = 0 1 00 01 10 11 • • • (the concatenation of the radix-ordered binary representations of the naturals) for which ∂ c, = ({0, 1} × {0, 1} ) ω .Lemma 2.2.For any ≥ 1, the -boundary sequence of an eventually periodic word is eventually periodic.[n] for all n ≥ max{ , |u|}.Indeed, consider a factor x of length n+|v|+ occurring at position i.We may write x = x s with |x | = n+|v| and |s| = .Since n ≥ |u|, there exists a factorization v = v 1 v 2 such that x ends with vv 1 , and s is a prefix of (v 2 v 1 ) ω .The factor of length n + occurring at position i is thus x (v 2 v 1 ) −1 s.We have shown that the boundary pairs are equal.This suffices for the proof.

Numeration systems and automatic words
For general references about automatic words and abstract numeration systems, see [2] and [48] or [7,Chap. 3].An abstract numeration system (ANS) is a triple S = (L, A, <) with L an infinite regular language over the totally ordered alphabet A (with <).We say that L is the numeration language.Genealogically (i.e., radix or length-lexicographic) ordering L gives a one-to-one correspondence rep S between N and L; the S-representation of n is the (n + 1)st word of L, and the inverse map, called the (e)valuation map, is denoted by val S .
In the following, we refer to the terminology introduced in [40] (addable systems are called regular in [53]).It is convenient to introduce a new padding symbol # which does not belong to the alphabet A. We let A # denote the set A ∪ {#}.We extend the evaluation map to # * L by setting val S (# n w) = val S (w) for all w ∈ L and n ∈ N. Definition 2.4.An abstract numeration system S = (L, A, <) is addable if the following graph of addition, denoted by L + , is regular: Notice that words in the numeration language L do not start with #; however, when dealing with tuples of such words, shorter S-representations are padded with leading #'s to get words of equal length (so they can be processed by an automaton reading tuples of letters).Continuing Example 2.3, for instance, the triplet #α #β αα belongs to L + .Remark 2.5.Positional numeration systems (whose numeration language is regular) are special instances of ANS.Let us recall this classical setting.Let U = (U n ) n≥0 be an increasing sequence of integers such that U 0 = 1.Any integer n can be decomposed (not necessarily uniquely) as n = t i=0 c i U i with non-negative integer coefficients c i .The finite word c t • • • c 0 ∈ N * is a U-representation of n.If this representation is computed greedily [22,46], then for all j ≤ t we have said to be the greedy (or normal) Urepresentation of n.By convention, the greedy representation of 0 is the empty word ε, and the greedy representation of n > 0 starts with a non-zero digit.An extra condition on the boundedness of sup i≥0 (U i+1 /U i ) implies that the digit-set for greedy representations is finite.
A sequence U satisfying all the above conditions is said to define a positional numeration system.Any such system for which the numeration language rep U (N) is regular is an ANS.For a positional numeration system, the existence of the digit 0 permits to avoid the introduction of an extra symbol #.Padding can thus be achieved using leading zeroes.
Example 2.6.In this example, the numeration system has no digit 0 and has the property of being unambiguous.Consider the ANS S built on the language L = {1, 2} * and the sequence U = (2 n ) n≥0 .The first few words in L are ε, 1, 2, 11, 12, 21, 22, . ... The nth word , but the greedy U-representation of n is just its base-2 expansion over {0, 1} and is therefore not equal to rep S (n) ∈ {1, 2} * .The ANS S is not, strictly speaking, a positional numeration system.Nevertheless the graph of addition for triplets of S-representations is regular.See Fig. 2 where is depicted a DFA accepting the corresponding language reading least significant digit first, digits are processed from right to left.One simply has to deal with a carry 0, 1, 2 stored within the state.Transitions are of the form m −→ n with label A deterministic finite automaton with output (DFAO) A is a DFA (with state set Q) equipped with a mapping τ : Q → A (with A an alphabet).The output A(w) of A on a word w is τ(q), where q is the state reached by reading w from the initial state.Definition 2.7.Let S = (L, A, <) be an ANS.An infinite word x is S-automatic if there exists a DFAO A such that x[n] = A(rep S (n)).In particular, for an integer k ≥ 2, if A is fed with the genealogically ordered language L = {ε} ∪ {1, . . ., k − 1}{0, . . ., k − 1} * , then x is said to be k-automatic.If A is fed with the U-representations of integers, with U a positional numeration system, x is said to be U-automatic.

Theorem 2.8 ([48]
).A word x is morphic if and only if it is S-automatic for some abstract numeration system S.
We note that the proof of the above theorem shows that the equivalence is completely effective: given the morphisms producing the word x, one can construct an ANS S and a DFAO generating x, and vice versa.
Fix s ∈ A * .For a word x, define the subsequence x•s by (x•s)[n] := x[val S (p s,n s)], where p s,n is the nth word in the genealogically ordered language Ls −1 = {u ∈ A * | us ∈ L}.The S-kernel of the word x is defined as the set of words {x • s | s ∈ A * }.The following theorem is critical to our arguments.Details are given in [7,.Theorem 2.9 ([48]).A word x is S-automatic if and only if its S-kernel is finite.
Again, the theorem is completely effective: with the underlying ANS S fixed, given (a Turing machine generating) x, and (the cardinality of) the S-kernel, one can compute the DFAO generating x, and vice versa.
Example 2.10.Consider the Fibonacci numeration system based on the sequence of Fibonacci numbers (F n ) n≥0 with F 0 = 1, F 1 = 2, and F n+1 = F n + F n−1 for n ≥ 1.The first few terms of the associated subsequences µ s : N → N, such that (x • s)[n] = x[µ s (n)], are given in Table 1.One simply computes the numerical value of all the Fibonacci representations with the suffix s.Notice that some kernel elements x • s may be finite; more precisely, this occurs exactly when the language Ls −1 is finite.Our reasoning will not be affected by such particular cases, and we let the reader adapt it to such situations.
In Section 3.1, we require S to be addable.Note that these assumptions of having a numeration language that is regular and addable are shared by many classical systems.For instance, the usual integer base numeration systems or the Fibonacci numeration system have all the assumed properties.For the latter system, the minimal automaton (reading most significant digits first) of L + has 17 states (its transition table is given in [36]).The one reading least significant digits first has 22 states.The largest known family of positional systems with all these properties (addable with rep U (N) being regular) is the one of those based on a linear recurrence sequence whose characteristic polynomial is the minimal polynomial of a Pisot number [8,46].One practical difficulty when one wants to use automatic provers (such as Walnut [35]) is to be able to provide the relevant automaton for addition.

Link with first-order logic
The result stated below is at the origin of Walnut.It relies on the effective transformation of formulae to automata.It was first stated for integer-based systems.Making use of [11,Lem. 37 and Thm. 55], it was extended to addable systems: Theorem 2.11 ([53,Thm. 6.4.1]).Let S be an addable numeration system.There is an algorithm that, given a formula ϕ with no free variables, phrased in first-order logic, using only the universal and existential quantifiers, addition and subtraction of variables and constants, logical operations, comparisons, and indexing into a given S-automatic sequence x, will decide the truth of the formula ϕ.Furthermore, if ϕ has t ≥ 1 free variables, the algorithm produces a DFA that recognizes the language of all representations of t-tuples of natural numbers that make ϕ evaluate to true.
As already mentioned in [25], the boundary sequence of a k-automatic word x may be defined by means of a first-order formula and therefore automaticity readily follows.This extends to addable systems S: let x ∈ B N be S-automatic for an addable system S.The above theorem implies that, for all b ∈ B, we have a formula ϕ b (n) which is true if and only if x[n] = b.We have For each subset R of A × A there is thus a formula ψ R (m) which is true if and only if ∂ x, [m] = R.We may now apply Theorem 2.11 to conclude that ∂ x, is S-automatic.These arguments appear in [53, §8.1.11]in the case = 1.
Example 2.12.We use the strategy described in [53,Sec. 8.1.11],where the boundary sequence of the Fibonacci word is computed.Here, we consider the Thue-Morse word t and show how to get its 2-boundary sequence , then this pair belongs to ∂ t,2 [n].In Walnut, depending on the value of α, β, γ, δ, we provide sixteen definitions of the form which create deterministic automata recognizing base-2 expansions of the sets In particular, $TMboundαβγδ(n) evaluates to TRUE whenever n belongs to T αβ,γδ .A direct inspection shows that only seven different 2-boundary sets occur in ∂ t,2 : a := {0, 1} 2 × {0, 1} 2 \ {(00, 00), (00, 01), (01, 11), (10, 00), (11,10), (11,11)}, b := {0, 1} 2 × {0, 1} 2 \ {(00, 00), (00, 11), (01, 10), (10, 01), (11, 00), (11,11)}, c := {0, 1} 2 × {0, 1} 2 \ {(00, 10), (01, 00), (10,11), (11, 01)}, d := {0, 1} 2 × {0, 1} 2 \ {(00, 00), (00, 11), (11, 00), (11,11)}, This can be checked as follows.For the set a, we provide the following definition where $TMbound0(n) evaluates to TRUE whenever ∂ t, 2 [n] = a.Similar definitions are readily written for b, . . ., g.The following expression evaluates to TRUE eval TM2boundarycheck "An (n>1) => (($TMbounda(n meaning that we are not missing any 2-boundary set.We combine the seven automata produced by Walnut accepting base-2 expansion of the integers n such that ∂ t, 2 [n] is a particular letter in {a, . . ., g} into the DFAO depicted in Fig. 3 using the command combine TM2boundarySequence TMbounda TMboundb TMboundc TMboundd TMbounde TMboundf TMboundg: Inspecting Fig. 3, we notice that a, b and e appear exactly once.On the other hand, we have ∂ t, 2 [n] = c precisely when n is an even power of 2, while ∂ t, 2 [n] = f if and only if n is an odd power of 2 strictly larger than 2. Finally ∂ t, 2 [n] = d precisely when n is odd and at least 5. Through the effective conversion of the automaton to a morphic representation of the word, we get the 2-uniform morphism generating the 2-boundary sequence (prepended with two symbols #$, because the 2-boundary sequence is indexed starting at 2): Example 2.13.For the Fibonacci word f, the strategy is similar to the one given in the previous example.Instead of binary expansions, we simply make use of Fibonacci expansions which are also available in Walnut.For instance The details and the resulting automaton can be found in [53,Sec. 8.1.11].Doing a similar job for the 2-boundary sequence, here at most 9 (and not sixteen, as in the previous example) boundary pairs may occur because f does not contain 11 as a factor in f.We can check that ∂ f,2 is made of five different boundary sets.The resulting DFAO is depicted in Fig. 4 (the sink state reached when reading a factor 11 is not represented).We thus get the morphism generating the 2-boundary sequence of f prepended with two symbols #$: Finally, we also computed the DFAO for the boundary sequence of the Tribonacci word (reading Tribonacci expansions), the fixed point of the morphism 0 → 01, 1 → 02, 2 → 0. Surprisingly, the (minimal) automaton produced by Walnut has 118 states.It also reveals that there are exactly seven boundary sets appearing in the boundary sequence of the Tribonacci word.One can show, using Walnut, that the first and second sets in the boundary sequence appear exactly once, otherwise the boundary sets appear infinitely often.

On the boundary sequences of automatic words
In this section we provide the first of our main contributions, an alternative proof (not relying on Theorem 2.11) to the fact that an S-automatic word has an S-automatic boundary sequence whenever S is addable.We then show that this result does not necessarily hold for a non-addable system.

Addable systems: automatic boundary sequences
For the sake of presentation, we only consider the case of the 1-boundary sequence.Our proof provides a precise description of a set containing the S-kernel of ∂ x in terms of three equivalence relations based on the kernel of x, the graph of addition, and the numeration language; see (2).This set is finite, and so Theorem 2.8 gives the claim.In particular, one is the Myhill-Nerode congruence associated with the graph of addition since we have to consider the elements x[i] and x[i + m] for some m > 0. For > 1, the only technical difference is that we have to consider longer factors Theorem 3.1.Let S = (L, A, <) be an addable ANS and let x be an S-automatic word.The boundary sequence ∂ x is S-automatic.
Proof.Thanks to Theorem 2.9, the S-kernel of x is finite, say of cardinality m.Moreover, since L and L + are regular, the following two sets of languages are finite by the Myhill-Nerode theorem [52, Sec.3.9], say of cardinality k and , respectively: Let ∂ x be the boundary sequence of x.An element of the S-kernel of ∂ x is given by ∂ is the nth word in the language Ls −1 , n ≥ 0. Let us inspect the nth term of such an element of the kernel: it is precisely the set of pairs of letters.Let t, r be length-|s| suffixes of words in # * L for which L + (s, t, r) −1 is non-empty.There exist words w, x, y such that ws, xt, yr ∈ # * L and val S (ws) + val S (xt) = val S (yr).We let P(s) denote the set of such pairs (t, r) ∈ (A # × A) |s| .Now partition (1) depending on the suffixes of length |s| of rep S (i) and rep S (i + val S (p s,n s)): we may write Roughly speaking, we look at all pairs of positions such that the first one is represented by a word ending with t, the second position is a shift of the first one by val S (p s,n s) and is represented by a word ending with r.
For convenience, we set L(s, t, r, n Indeed, the second condition means that p s,n = p s ,n for all n. • Ordering L(s, t, r, n).For each w, x of the same length, there is at most one y not starting with # such that (w, x, y) belongs to L + (s, t, r) −1 .Similarly if y does not start with #, for each there is at most one x (resp., w) such that (w, x, y) belongs to L + (s, t, r) −1 .Now let (w, x, y) and (w , x , y ) in L(s, t, r, n).We will always assume (this is not a restriction) that triplets do not start with (#, #, #) -otherwise, different triplets may have the same numerical value.Note that val S (x) < val S (x ) if and only if val S (y) < val S (y ).Indeed, w, w both belong to # * p s,n , thus val S (ws) = val S (p s,n s) = val S (w s).We have val S (yr) − val S (xt) = val S (ws) = val S (w s) = val S (y r) − val S (x t), so, val S (yr) − val S (y r) = val S (xt) − val S (x t).
Since S is an ANS, if val S (xt) > val S (x t), this means that, discarding the possible leading #'s because they have no effect on the evaluation, xt occurs after x t in the genealogically ordered language.So x has to be genealogically larger than x .Again discarding the possible leading #'s, the above equality means that x is genealogically less than x if and only if the same holds for y and y .We can thus order L(s, t, r, n) by listing in increasing genealogical order the second component of the elements, and therefore the jth element of L(s, t, r, n) is well-defined.
• Defining two subsequences by the maps λ s,t,r,n : N → N and µ s,t,r,n : N → N.
Let (w j , x j , y j ) be the jth element in L(s, t, r, n) with j ≥ 0. After removing the leading #'s, the word x j belongs to Lt −1 ∪ {ε}, which can also be genealogically ordered.We let λ s,t,r,n (j) denote the index (i.e., position counting from 0) of x j within this language.Similarly, the word y j belongs to Lr −1 ∪ {ε} and has an index µ s,t,r,n (j) within this language.Note that if L + (s, t, r) −1 = L + (s , t , r ) −1 , Ls −1 = Ls −1 , and Lt −1 = Lt −1 then, for all n, the maps λ s,t,r,n and λ s ,t ,r ,n are the same.Indeed, the first two conditions imply that L(s, t, r, n and Lr −1 = Lr −1 then, for all n, the maps µ s,t,r,n and µ s ,t ,r ,n are the same.
We now obtain x[val S (y j r)]) Let us define an equivalence relation ∼ on triplets by (s, t, r) ∼ (s , t , r ) if and only if all the following hold: Since we have regular languages and the kernel of x is finite by assumption, this relation has a finite index (bounded by k 3 m 2 ).Given s, the set {(s, t, r) | (t, r) ∈ P(s)} can be replaced by a set Λ(s) of representatives of the equivalence classes for ∼.Since ∼ has a finite index, there are finitely many possible subsets of the form Λ(s). So, we can write

A family of non-addable systems
In this section, we show that the addability assumption on the numeration system is not necessary for the boundary sequence of an automatic word to be itself automatic.With Examples 3.3 and 3.7, we consider S-automatic sequences based on a non-addable ANS S but such that the corresponding boundary sequences are still S-automatic.The first lemma is merely an observation that we will frequently use.Proof.By assumption, w contains factors of the form 0 k+1 , 0 k 1 and 10 k for all k ≥ 1.So a boundary set can either be a or b.In w, a window of length k will start with 1 and is followed by a 1 only if there exists n such that w[n] = 1 and w[n Let s be an integer.In the next three examples, we consider morphic words w s from the same family.They are the image under the same coding (up to erasing the first symbol) of a fixed point of g s : 0 → 01, 1 → 12 s , 2 → 2, for s ≥ 1.We show that the corresponding boundary sequences ∂ ws may exhibit quite different behaviors: for s = 1, it is constant; for s = 2, it is periodic of period 4, and for s ≥ 3, it is aperiodic.
The ANS S = (α * β * , {α, β}, α < β) is known to be non-addable, [29,Thm. 17].The reason is that multiplication by a constant generally does not preserve S-recognizability, hence addition cannot have this property.Proof.Let s be a suffix of a word in α * β * .We make use of the same notation as in the proof of Theorem 3.1.Let p s,n be the nth word in α * β * s −1 , for n ≥ 0. As in (1), the nth term of an element of the S-kernel of ∂ w 1 is given by For the numeration language of interest, the admissible suffixes s are of the form α β k for some , k ≥ 0. If > 0, then p s,n = α n and always contains (0, 0), (0, 1), (1, 0).For all n ≥ 0, there exists j such that and again (1, 1) belongs to this set for a convenient choice of t.As a conclusion, the S-kernel contains a unique constant sequence b ω so the boundary sequence is S-automatic.In particular, we have shown that ∂ w is constant (for the choice of suffix s = ε).
For the word w 1 , we can go further and prove the S-automaticity of its -boundary sequence.Proof.Since w 1 is the characteristic sequence of the triangular numbers, its prefix of length T n + 1 (n ≥ 0) ends with 1 and contains n + 1 occurrences of 1; more precisely we have Now consider the -boundary sequence ∂ w 1 , of w 1 .Let n ≥ and consider a length-factor u of w 1 .Assume first that u contains at most one letter 1.Then u can take two forms.
• If u = 0 i 10 −i−1 for some i ∈ {0, 1, . . ., − 1}, then (5) implies that the pairs (u, 0 ), (u, 0 j 10 −j−1 ) for j ∈ {0, 1, . . ., − 1} all belong to We now examine the contribution to ∂ w 1 , [n] of a length-factor u containing at least two occurrences of 1. Due to (5) again, u only appears once in w 1 , so there is a unique factor v of w 1 such that the pair (u, v) belongs to ∂ w 1 , [n].From (5), one sees that long stretches of letters 0 appear in w 1 .Notice that T −2 is the last position (starting at 0) of a length-factor containing at least two occurrences of 1 in w 1 (this is the factor 10 −2 1).Then T −2 + = T −1 + 1.We compute ∂ w 1 , [n] for n ranging into two intervals: either n belongs to for some m > T −1 .(For I 1 to be non-empty, we must have Let u be a factor of w 1 such that |u| 1 ≥ 2, and let uyv be the unique factor of length n + , with |y| = n, starting with u.We claim that v = 0 .Notice that the first letter of this particular occurrence of v appears at a position in the interval goes through all positive integers).Therefore, since the length of I 2 is constant (and equal to T −1 + 1), there exists a word w of length T −1 + 1 such that All in all, we have shown that ∂ w 1 , is of the form p n≥1 c n w for some word p, a letter c, and a word w of length Since there are only finitely many values of i for which the corresponding boundary sets are distinct, ∂ w 1 , is S-automatic.
In the following example, we illustrate the proof of the previous result for several values of .  .
For instance, we see that the sets ∂ w 1 ,5 [n] for n ∈ [74, 78] differ on the pair (u 0 , v).However ∂ w 1 ,5 [n] for n ∈ [67, 73] all contain (u 0 , 0 5 ) so u 0 cannot tell them apart.To that aim, one has to go through all columns in the previous table, therefore covering all possible values of u i .
Example 3.7.Let S = (α * β * ∪ β * γ * , {α, β, γ}, α < β < γ) be an abstract numeration system whose language has exactly 2n + 1 words of length n.For a construction of regular languages with a specific polynomial growth, see [45].Consider the S-automatic word given by the characteristic sequences of the words from the sublanguage α * within α * β * ∪β * γ * : w 2 = 1100100001000000 • • • .This is exactly the characteristic sequence of the set of squares.This word is also obtained using the morphisms We provide two proofs, the first one is generic.It aims to show the finiteness of the S-kernel of ∂ w 2 without explicitly determining the boundary sequence.The second one is less systematic but directly shows periodicity.
Proof sketch.To prove that the S-kernel is finite, we first guess that it contains 14 elements.We have computed prefixes of elements of the S-kernel with different suffixes given in Table 2.For the system S, since values are given by positions within the genealogically ordered language, we easily get val S (α i β j ) = (i + j) 2 + j and val S (β j γ k ) = (j + k) 2 + j + 2k.
Proving the finiteness of the S-kernel amounts to prove relations such as: Here is a shorter proof because, in our particular example, the boundary sequence is periodic.
Proof.Let us show that ∂ w 2 = (babb) ω .We make use of Lemma As a side comment, if an addable numeration system is such that the graph of n → T n is also regular (i.e., the set of pairs (rep(n), rep(T n )), where the shortest representation is conveniently padded, is a regular language), then the first order theory of N, +, x 2 would be decidable.But this structure is equivalent to N, +, • which is well known to have an undecidable theory.
To end up this short section, we consider a third example which is a small variation of the previous one.
In the remainder of this part, we fix s ≥ 3 and write w = w s for short.Applying Lemma 3.2, the boundary sequence is such that ∂ w [k] = b if and only if k can be written as for some integers m > n ≥ 0. We say that an integer k is representable if there exist m, n ∈ N with m > n such that the above equation holds.
Proof.Assume first that s is odd.We make an observation about representable integers of a certain form.

Claim 1.
Let p be a prime number congruent to 1 (mod s).For any i, j ≥ 0, s i • p j is representable if and only if p j ≥ s s i −1 2 + 1.

Proof of claim:
Notice that p is an odd prime number.Assume that p j ≥ s s i −1 2 + 1.Then there exists n ≥ 0 such that p j = s s i −1 2 + 1 + ns.Setting m = n + s i , we find Thus s i p j is representable.Assume then that p j < s s i −1 2 + 1, but towards a contradiction, that s i p j = P m − P n for some integers m > n ≥ 0. We thus have Notice that s(m + n − 1) + 2 ≡ 2 (mod s).Consequently, as s ≥ 3, we must have s i | m − n.Furthermore, since p ≡ 1 (mod s), we must have that 2 | s(m + n − 1) + 2 due to the same observation.Therefore, we have s(m + n − 1) + 2 = 2p j 1 and m = n + p j 2 s i with j 1 + j 2 = j.Plugging the latter into the former, we find where in the last inequality, we have used the assumption.This is a contradiction.Thus s i p j is not representable, as claimed.
Assume towards a contradiction that ∂ w is eventually periodic, i.e., ∂ w = uv ω for some finite words u, v. Let i ≥ 1 be such that s i ≥ |u|.Then the previous claim and ( 6) imply ∂ w [s i ] = a, and by assumption, ∂ w [s i + n|v|] = a for all n ≥ 0. Let however p be a prime congruent to 1 (mod |v|s) (and thus p ≡ 1 (mod s)) and p ≥ s s i −1 2 + 1.Note that there exist infinitely many primes of this form by Dirichlet's theorem for primes in arithmetic progressions (see, e.g., [3,Thm. 7.9]).Write p = q • |v|s + 1.Take n = s i+1 q; then we have This implies that ∂ w [s i + n|v|] = b by the above claim together with (6).This contradiction shows that w is aperiodic when s is odd.
Assume then that s is even, say s = 2t with t ≥ 2. Then we have that k is representable if and Claim 2. Let p be a prime number congruent to 1 (mod s).Let q = 1 if t is odd, otherwise let q = t + 1.
Then, for all i, j ≥ 0, we have that Proof of claim: If p j • q ≥ t(t i − 1) + 1, then there exists n ≥ 0 such that p j q = t(t i − 1) + ns + 1: indeed, if t is odd, we have t(t i − 1) ≡ 0 (mod s) and p j q = p j ≡ 1 (mod s).If t is even, then t(t i − 1) ≡ t (mod s) and we have p j q = p j (t + 1) ≡ t + 1 (mod s).Now set m = n + t i .We thus find showing that t i p j q is representable.
For the converse, assume again that t i p j q = P m − P n but that p j q < t(t i − 1) + 1.We thus have t i p j q = (m − n)(t(m + n − 1) + 1).
By inspection modulo t, we must have that m − n = t i p j 1 q 1 and t(m + n − 1) + 1 = p j 2 q 2 , where j 1 + j 2 = j and q 1 q 2 = q.We plug in m = t i p j 1 q 1 + n into the second term to obtain where the last inequality is obtained by using the assumption.This is a contradiction.Therefore t i p j q is not representable, as was claimed.
To conclude the proof of the proposition, assume again towards a contradiction that ∂ w = uv ω .Let q = 1 if t is odd, and otherwise let q = t + 1.Let i ≥ 1 be such that t(t i − 1) + 1 > q and t i ≥ |u|.Then by the above claim t i q is not representable.In fact, by periodicity, we have that t i q + n|v| is not representable for all n ≥ 0. Let however p be a prime with p ≡ 1 (mod s|v|) (in particular p ≡ 1 (mod s)), and such that pq ≥ t(t i − 1) + 1 (again Dirichlet's theorem implies the existence of such a prime).Write p = r • s|v| + 1 and let n = t i qsr.We then have t i q + n|v| = t i q + t i qrs|v| = t i q(1 + rs|v|) = t i qp, which is a representable number by the above claim.This contradiction shows that ∂ w is aperiodic.

Non-addable systems: counterexamples
Our aim is to show that the boundary sequence of a U-automatic word is not always U-automatic.Here, we have special instances of abstract numeration systems which are, in particular, positional.So we refer to the sequence U defining the system.We give two such examples.The numeration system defined first is a variant of the base-2 system.Example 3.12.Take the numeration system (U n ) n≥0 defined by U n = 2 n+1 − 1 for all n ≥ 0. We have 0 * rep U (N) = (0 + 1) * (ε + 20 * ).Consider the characteristic word u of U, i.e. , u[n Proof.The word u is trivially U-automatic.By Lemma 3.2, we have R 3 because of the following three observations.The U-representations of the elements in X for m = 0 and r > 0 are given by the words in R 1 because, in that case, val U (20 For m > 0 and r < U m , i.e., | rep U (r)| ≤ m − 1, the U-representations of the elements in X are given by the words in Finally, the case m > 0 and r ≥ U m is handled by the words in An application of the pumping lemma shows that R is not regular.By contradiction, if R is regular, then R ∩ 1 * 20 * is regular and accepted by a DFA with t states.We conclude that there exist infinitely many integers As a consequence of the previous proposition and Theorem 3.1, U is non-addable.Remark 3.14.One may notice that both u and ∂ u are 2-automatic: this follows by the Büchi-Bruyère theorem [9] from the set where V 2 (y) is the smallest power of 2 occurring with a non-zero coefficient in the binary expansion of y.
In view of the above remark, Example 3.12 could be considered as unsatisfactory.We now make use of a similar strategy but with a more complicated numeration system, for which we do not know any analogue of Remark 3.14.To this end, consider the non-addable numeration system from [23,Ex. 3] or [34,Ex. 2] defined by This word is trivially V-automatic.The boundary sequence ∂ v starts with a a b a a a a a a a b a a b a a a a a a a a a a a a a a a a a a a a a a a a b a a a a a a a a a a b • • • where again a := {(0, 0), (0, 1), (1, 0)} and b := {0, 1} × {0, 1}.Similar to the above, {rep V (n) : ∂ v [n] = b} is not regular, whence Proposition 3.16.Let V be the numeration system given by (7).The word v from Example 3.15 is V-automatic but its boundary sequence ∂ v is not V-automatic.
Before diving into the proof, we set the stage with some remarks of the numeration system given in (7).We assume that the reader has some knowledge about β-numeration systems, see, for instance [46].

By using the recurrence relation defining
Hence rep V (V m+4j+3 − V m ) has the expected form (9).
We now show that d is not ultimately periodic.We apply Lemma 3.17 for x = 1 − 1/β 3 .The left-hand side in ( 8) is approximately 1.75.Since |γ| > 1, the right-hand side converges (absolutely) and the first few digits of its limit are −3.57.Hence d is not ultimately periodic.
To conclude the proof, we apply the pumping lemma to show that the language R := rep V ({V m+r − V m | m ≥ 0, r > 0}) is not regular.Proceed by contradiction.Suppose that R is accepted by a DFA with states.Then there exist words u, v, w with 0 < |v| ≤ and d has uv as prefix such that, for all n, uv n w belongs to R. This is a contradiction because d is not periodic.Remark 3.18.In the above proof, it is interesting to note that the non-regularity of the language R is really associated with V m+r − V m for r congruent to 3 modulo 4. Indeed, we have used the fact that Remark 3.19.We do not know whether v and ∂ v are both V -automatic for some numeration system V .

The extended boundary sequences of Sturmian words
We give two descriptions of the -boundary sequences of Sturmian words (Theorem 4.1 and Proposition 4.10) and discuss some of their word combinatorial properties.We first recap minimal background on Sturmian words seen as codings of rotations.For a general reference, see [31, §2].Let α, ρ ∈ T := [0, 1) with α irrational.Define the rotation of the 1-dimensional torus R α : and I 1 = T \ I 0 .(The endpoints of I 0 will not matter in the forthcoming arguments.)Define the coding ν : T → {0, 1} by ν(x) = 0 if x ∈ I 0 , otherwise ν(x) = 1.We define the word s α,ρ by , for all n ≥ 0. We call α the slope and ρ the intercept of s α,ρ .The characteristic Sturmian word of slope α is s α,α .

A description of the extended boundary sequence
In the following, a sliding block code of length r is a mapping B : For a Sturmian word s of slope α (and intercept ρ) and ≥ 1, the (shifted) -boundary sequence T∂ s, is obtained by a sliding block code of length 2 applied to the characteristic Sturmian word of slope α.
To prove the theorem we develop the required machinery.For a word u It is well known that u occurs at position i in s α,ρ if and only if R i α (ρ) ∈ I u .These intervals of factors of length can also be described as follows: order the set {−jα} j=0 as 0 = i 0 < i 1 < i 2 < • • • < i .For convenience, we set i +1 = 1.If the + 1 factors of length of the Sturmian word s α,ρ are lexicographically ordered as w 0 < w 1 < • • • < w , then I w j = [i j , i j+1 ) for each j ∈ {0, . . ., }. From the following claim it is evident that the intercept ρ plays no further role in our considerations.(This also follows from the fact that two Sturmian words have the same set of factors if and only if they have the same slope.) Notice that the intersection is a finite union of (possibly empty) intervals.Since the set (R i α (ρ)) i∈N is dense in T, it follows that there exists i The claim follows by applying the isomorphism R n α to the intersection.The endpoints of I u are of the form i j and i j+1 for some j ∈ {0, . . ., }.Hence, for n ≥ , the set of pairs belonging to ∂ x, [n] is determined by the positions of the rotated endpoints R n α (i j ) within the intervals I w k .Notice that each rotated endpoint R n α (i j ) always lies in the interior of some I w k whenever n > .When n = , we have R n α ({− α}) = 0, which is an endpoint of one of the intervals I w k .For the time being we assume n > , and return to the case n = in Proposition 4.8.Now, for example, if R n α (i j ) ∈ I w k then we have (w j , w k ), (w j−1 , w k ) ∈ ∂ x, [n] (if j = 0, w j−1 is replaced with w ).Determining the boundary sets can be quite an intricate exercise; see Example 4.3.
An alternative to considering the positions of the points R n α (i j ) within the intervals I w k is to consider the positions of the points R n α ({−jα}) within the intervals I w k -the only difference is the order of enumeration.For each n > , there is a map σ = σ n ∈ T , where T is the set of mappings from {0, . . ., } to itself, such that The realizable such configurations in (10) are called constellations.These points, when ordered according to the i j 's, determine the boundary set ∂ s, [n] as described above.See Example 4.3 (and Example 4.4) for an illustration of the construction.Definition 4.2.Let σ ∈ T be such that (10) holds for some n ∈ N. We define ∂ σ ∈ 2 A ×A as the boundary set corresponding to any constellation inducing σ.The corresponding words w 0 , . . ., w are written next to their interval.Here σ n is defined by (0, 1, 2, 3, 4) → (2, 0, 3, 1, 4).For any constellation inducing σ n , we see the pairs belonging to ∂ σn = ∂ f,4 [17] from Fig. 5 Coming back to the introductory Example 1.2, the five sets a 1 , . . ., a 5 correspond to the situations depicted from left to right in Fig. 6.For instance, in the fourth picture, we understand why 10 is a prefix belonging to three pairs in a 4 : the red inner interval intersects the three outer intervals of the partition.The situation is similar in the fifth picture where 01 is the prefix of three pairs in a 5 .It is however not the case with the first three sets/pictures.We give an accompanying example to Example 4.3 for the reader to clarify the notion on constellations.α (I u ) ⊃ I v , whence σ n is neither injective nor surjective.For instance, this is the case for the last two constellations in Fig. 6 (we have σ n equals (0, 1, 2) → (1, 2, 1), and (0, 1, 2) → (2, 1, 2), respectively).With α = (π − 3)/2 0.0708 and = 5, the partition of T is made of 5 short intervals of length α and one large interval of length 1 − 5α > 0.5.In Fig. 7, we see that five or four "short" rotated intervals are included in the same large interval (for n equal to 21 and 10 respectively).In particular, counting the number of matching pairs of colors around the circle, we see that ∂ x,5 [5] = ∂ x,5 [21] with cardinality 11 and |∂ x,5 [10]| = 12.Contrarily to Example 4.3 and Fig. 5 where each prefix and suffix belong to two pairs, here one prefix (corresponding to the large interval) belongs to six pairs of the boundary and the other prefixes belong to one pair (or two for one short interval in the constellation on the right of Fig. 7).
We have that r[n] = j if and only if the characteristic Sturmian word s α,α has the lengthfactor w j occurring at position n.
Proof of Theorem 4.1.Notice that by definition, the word r defined in Definition 4.2 is obtained by a sliding block code of length of the characteristic Sturmian word s α,α .We show that T∂ s, is obtained from r by a sliding block code of length + 1.The claim then follows since the composition of sliding block codes of length r and r , respectively, is a sliding block code of length r + r − 1.
Let n > .Consider the factor of length + 1 of r occurring at position m = n − − 1 ≥ 0: by definition we have r , for each j ∈ {0, . . ., }.This is equivalent to R m+ +1 ({−jα}) ∈ I u −j for each j ∈ {0, . . ., }.There thus exists a mapping We conclude that the factor of length + 1 appearing at position m in r determines the boundary set ∂ s, [n].Letting the mapping B : {0, . . ., } +1 → T capture this relation, we may define an associated sliding block code B of length + 1 such that B(r) = T∂ s, .
Proof of Proposition 4.8.Assume first that α > 1/2.Consider the set R n α (I u ) ∩ I v for some length-factors u, v, and n ≥ .We claim that it is an interval whenever it is non-empty.If it is not, then the intersection is a union of two intervals: without loss of generality |I u | > 1 − |I v |, and R n α (I u ) intersects I v from both ends, but does not contain I v entirely.Notice that the intervals corresponding to length-factors have length at most α whenever α > 1.Since in that case we get the contradiction |I u | > 1 − |I v | ≥ 1 − α > α , we must have α < 1.But now we know that the intervals have two admissible lengths, namely α and 1 − α (compare to the non-rotated points in Fig. 7 for an illustration).Now if 1 − |α is the largest of the two, we have a We conclude that for any lengthfactors u, v of s, the set R n α (I u ) ∩ I v is an interval or is empty.This implies that the boundary set ∂ x, [n] contains 2 + 2 elements whenever n > .Thus ∂ x, [ ] occurs only once in the -boundary sequence due to a cardinality argument.
Notice that either 00 or 11 appears in a Sturmian word s, so the above implies that the first letter of the (1-)boundary sequence ∂ s always appears infinitely often in the sequence.Returning to Example 1.2, since 0 4 does not appear in the Fibonacci word, the letter a 0 appears only once in ∂ f, 2 .
We conclude with the immediate corollary of Theorem 4.1 and Proposition 4.8; here we say that a word w is uniformly recurrent if each of its factors occurs infinitely often within bounded gaps (the distance between two consecutive occurrences depends on the factor).It is known that, e.g., Sturmian words are uniformly recurrent.Corollary 4.9.For any Sturmian word s, the shifted sequence T∂ s, [n] is uniformly recurrent.The sequence ∂ s, is uniformly recurrent if and only if 0 2 or 1 2 appears in s.

Another description of the extended boundary sequence
We give another description of the -boundary sequences of Sturmian words when ≥ 2. For any irrational number α ∈ (0, 1) there is a unique infinite continued fraction expansion Proof.Let (S j ) j≥−1 be the sequence associated to the slope α.Let then k be an integer such that |S k S k−1 | ≥ 2 + 1. Hence s α,α is a product of S k and S k−1 .It is now evident that with β = [0; a k+1 + 1, a k+2 , . ..], we have that g(s β,β ) = s α,α , where g is defined by g : ).We also have that X = pref 2 −1 (S k−1 S k ), as S k S k−1 and S k−1 S k are known to differ in only the last two letters [2, Thm.9.1.11].We have that X is a prefix of both S k X and S k−1 X: We define: h : 0 → B(S k X), 1 → B(S k−1 X), where B is the sliding block code of length 2 from Theorem 4.1 such that B(s α,α ) = T∂ s, (and T is the shift operator).Notice now that B(uv) = B(u pref 2 −1 (v))B(v) for any sufficiently long word v (and u non-empty).Therefore We illustrate the above construction with a couple of examples for the benefit of the interested reader.Now for any ≥ 2, the above proposition thus gives that ∂ f, is the morphic image of the characteristic Sturmian word of slope β = α.In other words, the -boundary sequence is always a morphic image of f.We generalize the last observation made in the above example.
Corollary 4.14.Let s be a Sturmian word with quadratic slope.Then ∂ s, is morphic.In particular, the -boundary sequence of a Sturmian word fixed by a non-trivial morphism is morphic.
Proof.A remarkable result of Yasutomi [55] (see also [5]), characterizing those Sturmian words that are fixed by some non-trivial morphism, implies that if a Sturmian word of slope α is fixed by a non-trivial morphism, then so is the characteristic Sturmian word of slope α.Furthermore, the slope is characterized by the property that α = [0; 1, a 2 , a 3 , . . ., a r ] with a r ≥ a 2 or α = [0; 1 + a 1 , a 2 , . . ., a r ] with a r ≥ a 1 ≥ 1 [15,38] (see also [31,Thm. 2.3.25]).Here x 1 , . . ., x t indicates the periodic tail of the infinite continued fraction expansion.As α is quadratic, it has an eventually periodic continued fraction expansion.There thus exist arbitrarily large k for which β = [0; a k + 1, a k+1 , . ..] gives a characteristic Sturmian word of slope β which is the fixed point of a non-trivial morphism (it is of the latter form).Proposition 4.10 then posits that T∂ s, is the morphic image of this word, and the claim follows (because prepending the letter ∂ s, [ ] preserves morphicity [2,Thm. 7.6.3]).
Notice that given the morphism fixing a Sturmian word s, one can compute (the continued fraction expansion of) the quadratic slope (and intercept) of s [54,42,30].Furthermore, any (not necessarily pure) morphic Sturmian word has quadratic slope [1,6], so in particular the boundary sequence of such a word is morphic.
The above corollary has an alternative proof via the logical approach as well.For the definitions of notions that follow, we refer to the cited papers.From the work of Hieronymi and Terry [26], it is known that addition in the Ostrowski-numeration system based on an irrational quadratic number α is recognizable by a finite automaton.This motivated Baranwal, Schaeffer, and Shallit to introduce Ostrowski-automatic sequences in [4].For example, they showed that the characteristic Sturmian word of slope α is Ostrowski α-automatic.Since the numeration system is addable, the above corollary follows by the same arguments as in Section 2. 3.
We remark that it is unclear to us whether some of the results proved in this section could be proved automatically using the very recent tool Pecan developed in [37,27].Minimal complexity words can be seen as a generalization of Sturmian words to larger alphabets: if a word (containing all letters of A) has less than n + |A| − 1 factors of length n for some n, then it is ultimately periodic.Otherwise it is aperiodic (a consequence of the Morse-Hedlund theorem).See [39,14,21,10,17] for characterizations and generalizations.

Factor complexities of the extended boundary sequences
The following proposition is almost immediate after the key Lemma 4.17.Proof.Recall that ∂ s, is obtained by a coding of the 2 -block coding of s α,α .The following lemma says that the coding is actually a bĳection; in other words, a length-2 factor of s uniquely determines a boundary set, or a letter, in the boundary sequence.We conclude that the factors of length n + 2 − 1 of s uniquely determine a factor of length n in the -boundary sequence.Since there are n + 2 such factors of s, the claim follows as the number of factors of length 2 of s, that is, the number of letters in ∂ s, , is 2 + 1.
Proof.Let σ (resp., σ ) satisfy (10) with n (resp., m in place of n, m = n).Since σ = σ , there exist j ∈ {0, . . ., }, and distinct factors v, v ∈ Fac (s) such that R n α ({−jα}) ∈ I v and R m α ({−jα}) ∈ I v .To fix a rotation direction, assume without loss of generality that v is lexicographically less than v, so I v appears before I v in clockwise order, starting from 0, in the 1-dimensional torus T. The situation is depicted in Fig. 8: the interval I v (resp.,I v ) is colored in orange (resp., dark red).Say that {−jα} is the starting point (in clockwise direction) of the interval I u , and is the ending point of the interval I w (again in clockwise direction); in particular, I u and I w are adjacent intervals.In particular, in Fig. 8, the interval R n α (I u ) in light turquoise (resp., R m α (I u ) in pink) appears after the interval R n α (I w ) in dark turquoise (resp., R m α (I w ) in purple) in clockwise order.We now have that ∂ σ contains (u, v) and (w, v), while ∂ σ contains (u, v ) and (w, v ).Assume towards a contradiction, that ∂ σ = ∂ σ .Then we must have (u, v ) and (w, v ) ∈ ∂ σ as well as (u, v) and (w, v) ∈ ∂ σ .We have the following: R n α (I u ) ∩ I v = ∅ = R n α (I u ) ∩ I v (this is shown in Fig. 8 where the light turquoise interval intersects both the orange and dark red interval) and similarly R n α (I w ) ∩ I v = ∅ = R n α (I w ) ∩ I v (this is shown in Fig. 8 where the dark turquoise interval intersects both the orange and dark red interval).Since I u and I w are intervals, we see that R n (I u ) covers all intervals I z between I v and I v in clockwise order starting from the point R n α ({−jα}).Again, this is illustrated in Fig. 8 where an interval I z is depicted in green.Similarly R n α (I w ) contains all intervals I z between I v and I v in anticlockwise order starting from the point R n α ({−jα}).The total number of the intermediate intervals I z is + 1 − 2 = − 1 ≥ 1, so assume without loss of generality that R n α (I u ) covers the interval I z .In particular, this means that (u, z) ∈ ∂ σ .But, we have a symmetric situation as follows: the interval R m α (I u ) covers all intervals between I v and I v in clockwise order starting from R m α ({−jα}): these are the same intervals covered by R n (I w ).Since (w, z) / ∈ ∂ σ , we get the contradiction that (u, z) / ∈ ∂ σ .This suffices for the claim.
We conclude with a formula for the factor complexity of the 1-boundary sequence of Sturmian words.Proof.Without loss of generality, we assume that 00 appears in s and 11 does not.Let B be the length-2 sliding block code from Theorem 4.1; it is not hard to show that B is defined by (00) → 0, (01), (10) → 1.To prove the claim, we show that B(u) = B(v) with u = v if and only if u is a prefix of (01) r and v is a prefix of (10) r (assuming |u|, |v| ≥ 2).This is enough since, as in the proof of Proposition 4.16, a factor of length n + 1 of s corresponds to a factor of length n of ∂ s .
Observe that if u is a prefix of (01) r and v is a prefix of (10) r , then B(u) = B(v) = 1 |u|−1 .Let us show the converse by induction on the length of u, v, and hence assume that B(u) = B(v) with u = v.If |u| = 2 = |v|, the claim is clear.Assume then that |u|, |v| > 2. If u and v begin with the same letter, then their second letter must be equal, because otherwise B(u) begins with 0 and B(v) with 1 or vice versa.So write u = abu and v = abv for some letters a, b ∈ {0, 1} and some binary words u , v .Since the words bu and bv are shorter and distinct, and have equal B-images, the induction hypothesis implies that one is a prefix of (01) r and the other a prefix of (10) r .This is, of course, impossible.We conclude that the words u and v begin with distinct letters.Without loss of generality, suppose that u begins with 0 and v with 1.Since 11 does not appear in s, we deduce that v begins with 10, hence B(v) begins with 1. Therefore u must begin with 01 for B(u) to begin with 1. Removing the first letter of u and v allows us to use induction to complete the claim.
As an immediate corollary, we see that the -boundary sequence is aperiodic for all ≥ 1.

Conclusions
There is no particular reason to consider boundary pairs of equal length.One may just as well define the (k, )-boundary sequence in an analogous manner.All the results appearing in Sections 2.3 and 3 can be extended straightforwardly to account for this seemingly more general notion.The methods used in Section 4 can also be adapted to deal with (k, )-boundary sequences straightforwardly.

Figure 1 :
Figure 1: A sliding window where we focus on two regions of a fixed length.

Example 1 . 2 .
Consider the Fibonacci word f = 0100101001 • • • ; the fixed point of the morphism 0 → 01, 1 → 0. We have ∂ f = a b b a b b b b a b b a b b b b a b b b b a b b a b b b b • • • , where a := {(0, 0), (0, 1), (1, 0)} and b := {0, 1} × {0, 1}.For instance, ∂ f [1] = a because the length-2 factors of f are 00, 01, 10, while ∂ f [2] = b because its length-3 factors are of the form 0_0, 0_1, 1_0, 1_1 (they are in fact 010, 001, 100, 101).The 2-boundary sequence starts with ∂ f,2 = a b c d e f b c d b c d e f b c d e f b c d b c d • • • 3.6] (compare to Proposition 4.8).Thus the quotient of the set of factors of length n occurring in a Sturmian word by the relation ≡ k is completely determined by ∂ s,k−1 [n − k + 1] for large enough k (depending on s).Other families of words with k-abelian equivalence determined by the boundary sets are given in [41, Prop.4.2].

Proposition 2 . 1 .
Let x be an aperiodic word and ≥ 1 be an integer.Then the boundary set ∂ x, [m], with 0 ≤ m < , appears exactly once in the sequence (∂ x, [n]) n≥0 .

Figure 4 :
Figure 4: A DFAO producing the 2−boundary sequence of f.
−1 a factor of length n + , we must have |v| ≥ 2(i + 1) − − j and such a factor v clearly exists when this condition is satisfied.Taking i = − 1 and j = 0 gives the maximum length requirement |v| ≥ , so the claim follows for n ≥ 2 .Therefore, the length-factors containing at most one letter 1 have the same contribution towards every boundary set in ∂ w 1 , [n] for n ≥ 2 .In other words, since w 1 has length-factors containing at least two letters 1, two boundary sets ∂ w 1 , [n] and ∂ w 1 , [p] may differ on the length-factors containing at least two letters 1.

ε b a b b b a b b b a b b b a b b b a b b α b b b b b b b b b b b b b b b b b b b b β a b a a b b b a b b a b b b a b a b b b β 2 a b b a b b b b b a a b b b a b b b a b β 3 b b b b b a b b b a b b a b b b b b a b β 4 b b a b b a b a b b b b a b b b a b b b γ b b b b a b b a b b b b b a b b b b a b γ 2 b a b a b b b b a b b b a b b a b b b a ab b a b a b a b a b a b a b a b a b a b a α 2
b a b a b a b a b a b a b a b a b a b a b bγ 2 a a b b a a b b a a b b a a b b a a b b β 2 γ 2 a b b a a b b a a b b a a b b a a b b a β 3 γ 2 b b a a b b a a b b a a b b a a b b a a β 4 γ 2 b a a b b a a b b a a b b a a b b a a bTable 2: Prefixes of elements of the form ∂ w 2 • s.

] = 1 Proposition 3 . 13 .
if and only if n ∈ {U j | j ≥ 0}.The boundary sequence ∂ u starts with a b a b a b a b a a a b a b a b a a a a a a a b a a a b a b a b a a a a a a a a a a a a a a a b a a a • • • where a := {(0, 0), (0, 1), (1, 0)} and b := {0, 1} × {0, 1}.One can show that the language {rep U (n) : ∂ u [n] = b} is not regular, hence: Let U = (2 n+1 − 1) n≥0 .The word u from Example 3.12 is U-automatic but its boundary sequence ∂ u is not U-automatic.

Example 4 . 4 .
What matters to determine the pairs belonging to the -boundary sequence are the non-empty intersections of the form R n α (I u ) ∩ I v .There are situations where R n α (I u ) ⊂ I v or R n
(01) • • • ) = b b a b b b b a b b a b b b b a b b b • • • , which indeed gives back Example 1.2 after prepending the letter a.

Proposition 4 . 8 .
For a Sturmian word s, the boundary set ∂ s, [ ] appears infinitely often in ∂ s, if and only if 0 2 or 1 2 appears in s.Otherwise it appears exactly once.

Definition 4 . 15 .
A word over an alphabet A is of minimal complexity if its factor complexity is n + |A| − 1 for all n ≥ 1.

Figure 8 :
Figure 8: The situation depicted in the proof of Lemma 4.17.

Proposition 4 . 18 .
Let r be the maximal integer such that (01) r appears in the Sturmian word s.The boundary sequence ∂ s has factor complexity n → n + 1, if n < 2r; n + 2, otherwise.

Table 1 :
The first few terms of some subsequences µ s for the Fibonacci numeration system.
[44, ∂ w 2 [k] = b if and only if k can be written as the difference of two squares m 2 − n 2 with m > n ≥ 0. With the same argument as in the previous proof, this holds if and only if k is not congruent to 2 modulo 4. -automatic for another numeration system T being addable.Since the considered numeration systems have a polynomial growth, a Cobham-like result implies that if w 1 (resp., w 2 ) is Tautomatic for some T , then T must have a polynomial growth[18, Cor.27].As a consequence of[44, Thm.15], ANS with a polynomial growth are not addable.This means that Examples 3.3 and 3.7 highlight words that are S-automatic only for some non-addable numeration systems S.
Remark 3.9.With Examples 3.3 and 3.7, we have exhibited sequences that are S-automatic for some non-addable numeration system S.One can naturally wonder if these sequences could also be T