Output sum of transducers: Limiting distribution and periodic fluctuation

As a generalization of the sum of digits function and other digital sequences, sequences defined as the sum of the output of a transducer are asymptotically analyzed. The input of the transducer is a random integer in $[0, N)$. Analogues in higher dimensions are also considered. Sequences defined by a certain class of recursions can be written in this framework. Depending on properties of the transducer, the main term, the periodic fluctuation and an error term of the expected value and the variance of this sequence are established. The periodic fluctuation of the expected value is H\"older continuous and, in many cases, nowhere differentiable. A general formula for the Fourier coefficients of this periodic function is derived. Furthermore, it turns out that the sequence is asymptotically normally distributed for many transducers. As an example, the abelian complexity function of the paperfolding sequence is analyzed. This sequence has recently been studied by Madill and Rampersad.

The purpose of this article is to use finite state machines as a uniform framework to derive such asymptotic results. The results mentioned above will follow as corollaries from our main results, see the end of the introduction for more details. As an example of a new result fitting into this framework, we study the abelian complexity function of the paperfolding sequence (cf. [27]), see Example 2.8.
Our main focus lies on transducers: these finite state machines transform input words to output words using a finite memory (see Section 2 for a more precise definition). In our case, the input is the q-ary digit expansion of a random integer in the interval [0, N ). We then asymptotically study the sum of the output of the transducer for N → ∞. This is also extended to higher dimensions.
While some of the examples can easily be formulated by transducers, other examples are more readily expressed in terms of recursions of the shape (1) a(q κ n + λ) = a(q κ λ n + r λ ) + t λ for 0 ≤ λ < q κ with fixed κ, κ λ , r λ ∈ Z, t λ ∈ R and κ λ < κ. We transform such a recursion into a transducer in Theorem 4 in Section 2.6. Several notions abstracting the sum-of-digits and related problems have been studied. One of them is the notion of completely q-additive functions a : N 0 → R with a(qn + λ) = a(n) + a(λ) for 0 ≤ λ < q (cf. [4]). These have been generalized to digital sequences as defined in [1,6]: A sequence a(n) is a digital sequence if it can be represented as a sum w f (w) where f is a given function and w runs over all windows of a fixed length κ of the q-ary digit representation of n. These digital sequences can easily be formulated by a recursion as in (1).
For a transducer T , let T (n) be the sum of the output labels of T when reading the q-ary expansion of n. For a positive integer N , we study the behavior of T (n) for a uniformly chosen random n in {0, . . . , N − 1}. Assuming suitable connectivity properties of the underlying graph of the transducer, we obtain the following results.
• The expected value is given by E(T (n)) = e T log q N + Ψ 1 (log q N ) + o (1) for a constant e T and a periodic, continuous function Ψ 1 (Theorem 1). • The variance is V(T (n)) = v T log q N − Ψ 2 1 (log q N ) + Ψ 2 (log q N ) + o(1) with constant v T and a periodic, continuous function Ψ 2 (x) (Theorem 1).
• After suitable renormalization, T (n) is asymptotically normally distributed (Theorem 1). • The Fourier coefficients of Ψ 1 are given explicitly in Theorem 2 and the Fourier series converges absolutely and uniformly. • The function Ψ 1 is nowhere differentiable provided that e T is not an integer (Theorem 3).
The exact assumptions for the various results are given in detail in the respective theorems. Results for higher dimensional input are available for expectation, variance, normal distribution and Fourier coefficients. Our theorems are generalizations of the following known results.
• For the sum of digits of the standard q-ary digit representations (cf. [7]), we obtain an asymptotic normal distribution, the Fourier coefficients and the non-differentiability (for even 1 q).
The error term vanishes, as stated in Remark 3.4. Therefore, the formula is not only asymptotic but also exact. The formulas for the Fourier coefficients by Delange [7] also follow from our Theorem 2. • The occurrence of subblocks in standard and non-standard digit representations is defined by a strongly connected, aperiodic transducer. Thus we obtain the expected value, the variance, the limit law and the Fourier coefficients (cf. [25,26,14] for the expected value). For one dimensional digit representations, we also obtain the non-differentiability (assuming e T = 0, 1) of the fluctuation in the expectation. • The Hamming weight is a special case of the occurrence of subblocks. Thus, Theorem 1 is a generalization of the results about the width-w non-adjacent form [20], the simple joint sparse form [15] and the asymmetric joint sparse form [20]. • A transducer defining a completely q-additive function consists of only one state. Therefore, we obtain an asymptotic normal distribution (as in [4]), the Fourier coefficients and the nondifferentiability (assuming e T ∈ Z and integer output). Here, the error term vanishes, too. • A digital sequence is defined by a strongly connected, aperiodic transducer. Thus, digital sequences are asymptotically normally distributed or degenerate. Assuming e T ∈ Z and integer output, the periodic fluctuation Ψ 1 (x) is non-differentiable. The Fourier coefficients can be computed by Theorem 2. See also [6] for results on the expected value. • Automatic sequences [1] are also defined by transducers: The output labels of all transitions are 0 and the final output labels are as in the definition of such sequences. Theorem 1 gives the expected value with e T = 0 (see also [29]) and, depending on the transducer, also the variance with v T = 0. The Fourier coefficients of the periodic fluctuation of the expected value are given explicitly in Theorem 2. • In [17], Grabner and Thuswaldner investigate the sum of digits function for negative bases s −q (n). They give a transducer to compute the function s −q (n) − s −q (−n). Their result about the limit law follows directly from our Theorem 1. As an example of a new result obtained by Theorem 1, we give an asymptotic estimate of the abelian complexity function of the paperfolding sequence in Example 2.8. In [27], the authors prove that this sequence satisfies a recursion of type (1). As consequences of Theorem 1, the expected value is ∼ 8 13 log 2 N , the variance is ∼ 432 2197 log 2 N and the sequence is asymptotically normally distributed.
In the sequel, we discuss the relation of our setting and our results with the notion of q-regular sequences introduced in [1].
The concept of q-regular sequences is more general than our setting, but a broader variety of asymptotic behavior is observed which precludes any generalization of our results to general q-regular sequences.
While T (n) is a q-regular sequence for any transducer T (see Remark 3.10), the converse is not necessarily true: Obviously, the sum of the output of a transducer reading the input n is always bounded by O(log n). However, the 2-regular sequence 2 a(n) = n if n is a power of 2, 0 otherwise can clearly not be bounded by O(log n).
Asymptotic estimates for q-regular sequences are given by Dumas [10,11]. By restricting our attention to sequences defined by transducers, we obtain an asymptotic estimate of the variance, explicit expressions for the Fourier coefficients of the fluctuation in the second term of the expected value, non-differentiability of this fluctuation as well as a central limit theorem.
Section 2 contains all the theorems and the required notions. In Section 2.2, Theorem 1, formulas for the first and second moment of the output sum of a transducer and its limiting distribution are presented. In Theorem 2 in Section 2.4, the Fourier coefficients of the periodic fluctuation Ψ 1 (x) of the expected value are stated. We discuss the non-differentiability of Ψ 1 (x) in Theorem 3 in Section 2.5. Section 2.6 deals with sequences satisfying the recursion (1) and higher dimensional analogues. We construct a transducer computing this sequence in Theorem 4. Thus, from Theorem 1, the expected value, the variance and the limit distribution follow in many cases.
This construction and the computations for the constants e T , v T and the Fourier coefficients can be done algorithmically by the mathematical software system Sage [31]: The general framework is included in Sage version 6.4.1 using its finite state machine package described in [19]. The code for the Fourier coefficients and the construction from a recursion is submitted for inclusion in future versions of Sage, see http://trac.sagemath.org/17222 and http://trac.sagemath. org/17221, respectively.
In Sections 3 to 6, we give the proofs of all the theorems from Section 2.

Results
This section starts with the definition of some notions about the connectivity of a transducer. Then we will state the theorems about the moments and the limiting distribution, the Fourier coefficients, the non-differentiability, and the construction of a transducer computing a sequence given by a recursion as in (1).

2.1.
Notions. We consider complete, deterministic and subsequential transducers (cf. [5,Chapter 1]). In our case, the input alphabet is {0, . . . , q − 1} d for a positive integer d and the output alphabet R. A transducer is said to be deterministic and complete if for every state and every digit of the input alphabet, there is exactly one transition starting in this state with this input label. A subsequential transducer T (cf. [30]) is defined to be a finite deterministic automaton with one initial state, an output label for every transition and a final output label for every state. Figure 1 presents an example of a complete, deterministic, subsequential transducer. The label of a transition with input ε and output δ is written as ε | δ.
The input of the transducer is the standard q-ary joint digit representation of an integer vector n ∈ N d 0 , i.e. the standard q-ary digit representation at each coordinate of the vector n. The input is read from right (least significant digit) to left (most significant digit), without leading zeros. Then the output of the transducer is the sequence of the outputs of the transitions along the unique path starting in the initial state with the given input and the final output of the last state of this path. The element T (n) of the sequence defined by the transducer T is the sum of this output sequence.
Using final output labels is convenient for our purposes. Clearly, it would also be possible to model the final output labels by using an "end-of-input" marker and additional transitions. In the context of digital expansions, the behavior can usually also be obtained by reading a sufficient number of leading zeros. But the approach using final outputs is more general as it is not required that the final outputs are compatible with the output generated by leading zeros.
For the various results, different properties of the complete, deterministic, subsequential transducer and its underlying digraph are needed. All states of the underlying digraph are assumed to be accessible from the initial state. Contracting each strongly connected component of the underlying digraph gives an acyclic digraph, the socalled condensation. A strongly connected component is said to be final strongly connected if it corresponds to a leaf (i.e., a vertex with outdegree 0) in the condensation. Let c be the number of final strongly connected components. We call a transducer or a digraph finally connected if c = 1.
For the asymptotic expressions, only the final strongly connected components are important. All other strongly connected components only influence the error term. Thus, we are not interested in the periodicity of the whole underlying digraph, but in the periodicity of the final strongly connected components. The period of a digraph is defined as the greatest common divisor of all lengths of directed cycles of the digraph. For j = 1, . . . , c, let p j be the period of the final strongly connected component C j . Define the final period of the digraph as We call a digraph finally aperiodic if p = 1. If the underlying digraph is strongly connected, its final period is equal to its period.
For proving the non-differentiability of the fluctuation, we not only need a finally aperiodic, finally connected digraph (p = c = 1), but also a reset sequence. A reset sequence is an input sequence such that starting at any state and reading this sequence leads to a specific state s. If the transducer is not finally aperiodic and finally connected, then there cannot exist a reset sequence.

Moments and Limiting
Distribution. This section contains the theorem about the moments of the output sum T (n) and the limiting distribution. Further results about the periodic fluctuation can be found in Theorems 2 and 3.
Denote by Φ µ,σ 2 the cumulative distribution function of the normal distribution with mean µ and variance σ 2 = 0. Thus, where the constants e T and ξ > 0 are given in (5) in Section 2.3 and Ψ 1 (x) is a p-periodic, Hölder continuous function. If all b j given in (5) are positive, the distribution function of T (n) can be approximated by a mixture of c Gaussian distributions with weights λ j , means a j log q N and variances b j log q N for some constants a j and λ j > 0 with c j=1 λ j = 1, given in (5). In particular, If all a j are equal, then T (n) has the variance (5)) and a p-periodic, continuous function Ψ 2 (x). Otherwise, the variance is V(T (n)) = Θ(log 2 N ).
If all a j are equal, T (n) converges in distribution to a mixture of Gaussian (or degenerate) distributions with means 0 and variances b j , weighted by λ j . In particular, if all b j > 0, If furthermore c = 1 and v T = 0, then T (n) is asymptotically normally distributed.
We give the proof of this theorem in Section 3. Remark 2.1. The assumption that b j > 0 is essential for obtaining uniform convergence of the distribution function and the speed of convergence in particular. To see this, consider the transducer in Figure 2.
It is easily seen that T (n) = (−1) n . For even N , the distribution function of T (n)/ log 2 N is given by which does not converge uniformly.

Eigenvalues and Eigenvectors of the Transition Matrix.
For the constants in Theorem 1 and the Fourier coefficients in Theorem 2, we need the notion of a transition matrix of the transducer and properties of its eigenvalues and eigenvectors. We label the states of the transducer with contiguous positive integers starting with 1. We denote the indicator vector of the initial state by e 1 .
Definition 2.2. Let t ∈ R be in a neighborhood of 0.
The transition matrix M ε for ε ∈ {0, . . . , q − 1} d is the matrix whose (s 1 , s 2 )-th entry is e itδ if there is a transition from state s 1 to state s 2 with input label ε and output label δ, and 0 otherwise.
Let M be the sum of all these transition matrices.
Lemma 2.3. There are differentiable functions µ j (t) in a neighborhood of t = 0 for j = 1, . . . , c such that the dominant eigenvalues of M are µ j (t) exp( 2πil p ) in this neighborhood of t = 0 for some of the l ∈ P = {k ∈ Z | −p/2 < k ≤ p/2}. For each of these dominant eigenvalues, the algebraic and geometric multiplicities coincide. For t = 0, µ j (0) = q d .
The proof of this lemma is given in Section 3. Let l ∈ Z. Consider the (not necessarily orthogonal) projection onto the direct sum of the left eigenspaces of M corresponding to the eigenvalues µ j (t) exp( 2πil p ) for j = 1, . . . , c such that the kernel is the direct sum of the remaining generalized left eigenspaces. Let w l (t) be the image of e 1 under this projection, where denotes transposition. The definition of w l (t) only depends on l modulo p.
We write w l for w l (0) and w l for the derivative of w l (t) at t = 0. Furthermore, w l is either the null vector or a left eigenvector of M corresponding to the eigenvalue q d exp( 2πil p ). Let C j be a final component with corresponding indicator vector c j . Define the constants λ j = w 0 c j .
In Section 3.1, we will show that λ j > 0 and c j=1 λ j = 1. With these definitions, the constants in Theorem 1 can be expressed as Finally, ξ > 0 is chosen such that all non-dominant eigenvalues of M have modulus strictly less than q d−ξ at t = 0. These constants can be interpreted as follows: a j log q N and b j log q N are the main terms of the mean and the variance, respectively, of the output sum of the final component C j . These expressions including the derivatives of the eigenvalues correspond to the formulas for mean and variance given in [12,Theorem IX.9]. The constants e T and v T are convex combinations of the corresponding constants of the final components C j .
The positive weight λ j in these convex combinations turns out to be the asymptotic probability of reaching the final component C j . This is connected to the following interpretation of the left eigenvector w 0 : If the final period p is 1, the entries of w 0 will be shown to be the asymptotic probabilities of reaching the corresponding states. This corresponds to the left eigenvector used in a steady-state analysis. If p > 1, these probabilities depend on the length of the input modulo p. Then, we will prove that w 0 gives the average of these probabilities taken over all residues modulo p. These interpretations are justified in Section 3.1.

Fourier
Coefficients. This section contains the formulas for the Fourier coefficients of the periodic fluctuation Ψ 1 (x). For this purpose, we need the following definitions.
Let χ k = 2πik p log q for k ∈ Z and 1 be a vector whose entries are all one. The s-th coordinate of the vector b(n) is the sum of the output of the transducer T (including the final output) if starting in state s with input the q-ary joint expansion of n. In particular, the first coordinate of b(n) is T (n), and b(0) is the vector of final outputs. Furthermore, define the vector-valued function H(z) by the Dirichlet series where the inequality in the summation index is considered coordinatewise and · ∞ is the maximum norm.
Theorem 2. Let T be a subsequential, complete, deterministic transducer. Then the Fourier coefficients of the p-periodic fluctuation Ψ 1 (x) are The Fourier series k∈Z c k exp( 2πik p x) converges absolutely and uniformly.
The function w k H(z) is meromorphic in z > d − 1. It has a possible double pole at z = d for k = 0 and possible simple poles at z = d + χ k for k = 0.
The proof of this theorem is in Section 4. The infinite recursion given in Lemma 4.5 can be used to numerically evaluate the Dirichlet series H(z) with arbitrary precision and to compute its residues at z = d + χ l (see Lemma 4.7 and [16]). For d = 1, the computation of the Fourier coefficients can be done by the mathematical software system Sage [31] (using the code submitted at http://trac.sagemath.org/17222).
Example 2.4. The (artificial) transducer in Figure 3 has two final components with periods 2 and 3, respectively. Thus the final period is 6 and the function Ψ 1 (x) is 6-periodic. The constant e T of the expected value is 11 8 . In Figure 4, the partial Fourier series with 2550 Fourier coefficients 3 is compared with the empirical values of the periodic fluctuation Ψ 1 , i.e., with integers N and 4 ≤ log 2 N ≤ 16.
The computation of these 2550 Fourier coefficients took less than 6 minutes using a standard dual-core PC.  In Example 2.8 we compute the first 2550 Fourier coefficients of the abelian complexity function of the paperfolding sequence.
As a corollary of Theorem 2, we obtain the following result which was already proved by Delange [7].
Corollary 2.5. The Fourier coefficients of the periodic fluctuation for the q-ary sum-of-digits function s q (n) are for k = 0 and χ k = 2πik log q where ζ denotes the Riemann ζ-function. We prove this corollary in Section 4.
2.5. Non-differentiability. In this section, we prove that for certain transducers, the periodic fluctuation Ψ 1 (x) of the expected value is nowhere differentiable.
Theorem 3. Let d = 1. Assume that e T ∈ Z and that the transducer T has a reset sequence and output alphabet Z. Then the function Ψ 1 (x) is non-differentiable for any x ∈ R.
The proof can be found in Section 5. There, we follow the method presented by Tenenbaum [32], see also Grabner and Thuswaldner [17].
In [32,17], the reset sequence consists only of 0's. If working with digit expansions, it is often possible to choose such a reset sequence. However, in the context of recursions, this is not always possible, see Example 2.8. There the reset sequence is (00001).
For a general finally aperiodic, finally connected transducer, the existence of a reset sequence cannot be guaranteed.

2.6.
Recursions. In this section, we describe how to reduce a recursion to a transducer computing the given sequence. All inequalities in this section are considered coordinate-wise.
Consider the sequence a(n), n ∈ N d 0 , defined by the recursion (10) a(q κ n + λ) = a(q κ λ n + r λ ) + t λ for 0 ≤ λ < q κ 1 and for all integer vectors n such that the arguments on both sides are non-negative. Furthermore, initial values a(n) for n ∈ I have to be given for a suitable finite set I ⊂ N d 0 . It must be ensured that the recursion (10) does not lead to conflicts and that the set of I is appropriate. Additionally, we require that I is minimal (with respect to inclusion). In that case, we say that the recursion is well-posed.
In Section 6, we construct a subsequential, complete, deterministic transducer T (also when the recursion is not well-posed) reading the q-ary joint expansion of integer vectors without leading zeros. We will define a distinguished subset of its states, called simple states. Furthermore, disjoint classes F 1 , . . . , F K of integer vectors will be defined.

Theorem 4. The recursion (10) is well-posed if and only if
(1) for each cycle consisting of simple states with transitions with zero input label, the sum of its output transitions vanishes and (2) the set I consists of one representative of each F j , 1 ≤ j ≤ K. In that case, the sum of the output of T is the sequence a, i.e., T (n) = a(n) for all n ≥ 0.
The proof of this theorem is in Section 6. Combining this result with Theorem 1 yields an asymptotic analysis of the sequence a(n), as in Example 2.8. Moreover, this asymptotic analysis can be performed algorithmically in Sage for d = 1 (using the code submitted at http: //trac.sagemath.org/17221). A combinatorial description of the sets F i involving an auxiliary transducer is given in Remark 6.1.
Remark 2.6. For d ≥ 2, and r λ ≥ 0, the sequence cannot be computed by a finite transducer: For every j ≥ 0, there are non-zero integer vectors n ≥ 0, n ≥ 0 with n ≡ n (mod q j )-i.e., a finite deterministic transducer cannot distinguish between n and n -such that the recursion (10) can be applied for the argument q κ n + λ but cannot be applied for q κ n + λ.
This problem does not arise in the case of dimension d = 1: if the end of the input is not yet reached (this is something the transducer knows), there is a guaranteed forthcoming digit ≥ 1 (instead of = 0 in the higher dimensional case). This information is enough to decide whether the recursion can be used.
Example 2.8. Consider the abelian complexity function ρ(n) of the paperfolding sequence. The paperfolding sequence is obtained by repeatedly folding a strip of paper in half in the same direction. Then we open the strip and encode a right turn by 1 and a left turn by 0. The abelian complexity function ρ(n) gives the number of abelian equivalence classes of subwords of length n of the paperfolding sequence. Two subwords of length n are equivalent if they are permutations of each other. In [27], the authors prove that this sequence satisfies the recursion with ρ(1) = 2 and ρ(0) = 0. The constructed transducer is shown in Figure 5. For simplicity, we do not state the final output labels in this figure. The expected value and the variance are , as the second largest eigenvalues of the transition matrix are −0.7718445063 ± 1.1151425080 i. The sequence ρ(n) is asymptotically normally distributed. The functions Ψ 1 (x) and Ψ 2 (x) are 1-periodic and continuous. The reset sequence of the transducer is (00001) (reading from right to left). The function Ψ 1 (x) is nowhere differentiable and its Fourier series converges absolutely and uniformly. The first 24 Fourier coefficients of Ψ 1 (x) are listed in Table 1. In Figure 6, the trigonometric polynomial formed with the first 2550 Fourier coefficients is compared with the empirical values of the function Ψ 1 (x) (see (8)).  Table 1. First 24 Fourier coefficients of the abelian complexity function ρ(n) of the paperfolding sequence.

Asymptotic Distribution -Proof of Theorem 1
This section contains some lemmas which will together imply Theorem 1. Our plan is as follows: First, we give auxiliary lemmas about the eigenvalues and eigenvectors of the transition matrix M in Section 3.1. Section 3.2 contains an asymptotic formula for the characteristic function of the random variable T (n). We use this characteristic function to give formulas for the expected value and the variance in Section 3.3, and prove the continuity of the periodic fluctuations in Section 3.4. Finally, we prove the central limit theorem in Section 3.5.
We use the notation (ε L . . . ε 0 ) q for the standard q-ary joint digit representation of an integer vector with ε L = 0. For a real number in the interval [0, q), we write (ε 0 ε 1 . . .) q for the q-ary digit representation choosing the representation ending on 0 ω in the case of ambiguity. Furthermore, we use Iverson's notation [18]: [expression] is 1 if expression is true and 0 otherwise. All O-constants depend only on q, d and the number of states.
3.1. Transition Matrix and its Eigenvectors. This section contains the proofs of some results on the eigenvalues, eigenvectors and eigenprojections of the transition matrix M .
For the proof of Theorem 1, we use the following lemma which describes the eigenvalues of a matrix in a similar way as the Perron-Frobenius theorem (cf. [13]).
Lemma 3.1. Let M be a matrix with complex entries whose underlying directed graph is p-periodic and strongly connected. Then the set of nonzero eigenvalues of M can be partitioned into disjoint sets of cardinality p where each set is invariant under multiplication by e 2πi/p and all eigenvalues in one set have the same algebraic multiplicities.
Proof. Since the underlying directed graph of M ∈ C n×n is a strongly connected, p-periodic graph, we can write M as with block matrices A i by reordering the vertices. Then M − xI is the product of the matrices Let h(x) be the characteristic polynomial of p j=1 A j ∈ C m×m . Thus the characteristic polynomial of M is x n−m−(p−1)m h(x p ). Therefore, the eigenvalues of M are either 0 or any p-th root of a non-zero eigenvalue of p j=1 A j . With this lemma, we can prove Lemma 2.3 about the eigenvalues of the matrix M : Proof of Lemma 2.3. First, consider the case t = 0. By construction, q d is an eigenvalue with right eigenvector 1 of M . As M ∞ ≤ q d , where · ∞ denotes the row sum norm, q d is a dominant eigenvalue.
Consider the strongly connected components of the underlying graph of T . Each final strongly connected component C j induces a final transducer T j which is strongly connected, complete, deterministic and p j -periodic. Thus, the adjacency matrix at t = 0 of this final transducer has a dominant eigenvalue q d with right eigenvector 1. By the Perron-Frobenius theorem (cf. [13, Theorem 8.8.1]), all dominant eigenvalues of this final transducer are {q d e 2πil/p | l ∈ P with p | lp j }, each with algebraic and geometric multiplicity one.
A non-final strongly connected component induces a transducer S with the adjacency matrix S. This transducer is not complete. Let S + be the complete transducer where loops are added to states of S where necessary. The adjacency matrix of S + is S + . Since S + is complete, deterministic and strongly connected, ρ(S + ) = q d . As S ≤ S + but S = S + , Theorem 8.8.1 in [13] implies ρ(S) < ρ(S + ) = q d .
Thus, the dominant eigenvalues are q d e 2πil/p with an l ∈ P such that there exists a j ∈ {1, . . . , c} with p | lp j . We determine the geometric multiplicities of these dominant eigenvalues of M in Lemma 3.2. Now, fix a final strongly connected component C j and some l ∈ P with p | lp j . In a small neighborhood of t = 0, let µ lj (t) be the eigenvalue of the submatrix of M corresponding to the complete transducer T j with µ lj (0) = q d e 2πil/p . Because of Lemma 3.1 applied to the final component C j separately, we have µ lj (t) = e 2πil/p µ j (t) where µ j (t) is defined to be µ 0j (t).
All other moduli of eigenvalues of M are less than min l,j |µ lj (t)| because of the continuity of eigenvalues.
We prove the differentiability of the eigenvalues in Lemma 3.2. At t = 0, the algebraic and geometric multiplicities of q d exp( 2πil p ) coincide.
Furthermore the eigenvalues and the eigenprojection corresponding to the eigenvalues µ j exp( 2πil p ) are analytic at t = 0.
Proof. Let q d exp( 2πil p ) be a dominant eigenvalue of M . Its algebraic multiplicity at t = 0 is |{j : p | lp j }|. We construct exactly one left eigenvector in the neighborhood of t = 0 for each final component C j with p | lp j : Let T j be the induced transducer of the final component C j . Letṽ (t) be a left eigenvector of the adjacency matrix of T j corresponding to the eigenvalue µ j (t) exp( 2πil p ). As the algebraic multiplicity is 1 in this final component, the choice ofṽ (t) is unique up to multiplication with a scalar function in t. Then, we construct the left eigenvector v (t) by paddingṽ (t) with zeros.
These left eigenvectors are linearly independent because of the block structure induced by the final components. Thus the geometric and the algebraic multiplicities of q d exp( 2πil p ) coincide. Furthermore, µ j (t) exp( 2πil p ) is a simple eigenvalue of the adjacency matrix of T j . Therefore, [24,Chapter II] implies the differentiability of the eigenvalues and eigenprojections.
From now on, we use the convention that the eigenspace correspond- is not an eigenvalue. Then its eigenprojection is the constant null function.
As an abbreviation, we write w lj , w , w lj and w for these projections and their derivatives at t = 0.
Remark 3.4. If there are only dominant eigenvalues, then w (t) = 0. This will imply that there is no error term in the asymptotic expansion of the expected value and the variance. This occurs in the case of the sum of digits of the standard q-ary digit representation and other completely q-additive functions because the transducer has only one state.
Lemma 3.5. In a fixed neighborhood of t = 0, let ξ > 0 be as defined in (5), i.e., all non-dominant eigenvalues have modulus less than q d−ξ .
k . Proof. Let P be the matrix such that x → x P is the sum of the eigenprojections onto the left eigenspaces corresponding to µ j exp( 2πil p ) for j = 1, . . . , c and l ∈ P. Then w = e 1 (I − P ) and As the spectral radius of (I − P )M is less than q d−ξ , we obtain the stated estimates.
With w l defined in Section 2.3, we have Note that left and right eigenvectors corresponding to different eigenvalues annihilate each other. Because of the block structure of the eigenvectors in Lemma 3.2 and because 1 is a right eigenvector to q d , we have (12) [ Denote by δ the vector whose s-th component is the sum of the outputs of all transitions leaving the state s. By the definition of the transition matrix M (t), δ can be expressed as .
We now establish a relation between δ, the left eigenvector w l and its derivative at t = 0. By definition of the left eigenvectors w lj (t) and (11), Differentiation, (12), (5) and (11) yield To establish the interpretation of w 0 given at the end of Section 2.3, we considerŵ the stationary distribution on the state space of all states of the transducer under the assumption that the input length is congruent to k modulo p. Using (11) and Lemma 3.5 yieldŝ Summation leads to 1 p p−1 k=0ŵ k = w 0 . Thus, λ j is the hitting probability of the final component C j when starting in the initial state. As every state is accessible from the initial state, λ j is positive.
Finally, for l = 0, (14) reads q −d w 0 δ = e T , which can be interpreted as the steady state analysis of the expectation: the probability distribution w 0 is multiplied with the expected output q −d δ.
3.2. Characteristic function. To obtain a central limit law in Section 3.5, we compute an asymptotic formula for the characteristic function in this section.
The next lemma can be proved by induction on L. It is a generalization of Lemma 3 in [20].
Lemma 3.6. Let A ε , ε = 0, . . . , q − 1 be matrices in C n×n , H ε : N 0 → C n×n be known functions with H 0 (0) = 0. Let G : N 0 → C n×n be a function which satisfies the recurrence relation The solution of this recursion finally leads to an asymptotic formula for the characteristic function.
We choose the branch −π + π p < arg z ≤ π + π p of the complex logarithm. After setting t = 0, we use only the logarithm of complex numbers for which our branch coincides the principal branch −π < arg z ≤ π.
Lemma 3.7. The characteristic function of the random variable T (n) is (24)), which are arbitrarily often differentiable in t and 1-periodic in x, and an error term R(N, t). This error term R(N, t) is arbitrarily often differentiable, too, and satisfies The function g(n) satisfies the recursion (17) g(qn + ε) = M ε g(n) for ε ∈ {0, 1, . . . , q − 1} d , n ≥ 0 with qn + ε = 0. We define further functions where the coordinates n 1 , . . . , n d of n with indices in the set C ⊆ {1, . . . , d} are fixed to N . This yields G(N ) = G ∅ (N ). Furthermore, we define the matrices for disjoint sets C, D ⊆ {1, . . . , d} and ε ∈ {0, 1, . . . , q − 1}. In this definition, we restrict the i-th coordinate β i of β to be ε or less than ε if i ∈ C or i ∈ D, respectively. Otherwise, the i-th coordinate can be arbitrary. Then, M = M ε ∅,∅ holds independently of ε.
Then, (17) yields the following recursions for G C (N ), ε = 0, . . . , q − 1, N ≥ 0 and C = {1, . . . , d}: (20) This recursion for G C only depends on G C for C C. As we can recursively determine G C using Lemma 3.6. In particular, for G(N ), this yields the recursion formula Thus by Lemma 3.6, we get By construction, M ε ∞ = 1 for every ε ∈ {0, . . . , q − 1} d . We conclude that M ε C,D ∞ ≤ q d−|C|−|D| ε |D| . By the definition of G C (N ), the growth rates of the functions G C (N ) and H ε (N which constitutes an explicit expression for the error term contributed by the non-dominant eigenvalues. By Lemma 3.5, its derivatives satisfy for k ≥ 0. Because u(0) = 1 and left and right eigenvectors corresponding to different eigenvalues annihilate each other, we have R(N, 0) = 0.
By (16), (23) and e 1 = l∈P c j=1 w lj + w , and q {x} = (x 0 x 1 . . .) q , choosing the representation ending on 0 ω in the case of ambiguity. The functions Ψ lj (x, t) are periodic in x with period 1 and well defined for all x ∈ R since they are dominated by geometric series. Furthermore, they are arbitrarily often differentiable in t.

3.3.
Moments. In this section we give the moments of the output sum T (n). Proof. The derivative of E(exp(itT (n))) with respect to t at t = 0 gives the expected value of the sum of the output of the transducer with p-periodic functions and constants a j defined in (5). Here, Ψ lj denotes the derivative with respect to t. We now compute Ψ 0 (x) for some x with q {x} = (x 0 x 1 . . .) q . To compute H ε (N ), we use (21) and the definition of G(N ) to obtain (26) H for t = 0, because 1 is a right eigenvector of M ε for every ε. Together with (24), this results in By (12), we have Ψ lj (x, 0) = 0 for l = 0.
To compute D(q d ), observe that We conclude that and therefore a j λ j = e T by (5). This completes the proof of the expectation as given in (3). Using Lemma 3.7 and (27), the second derivative of E(exp(itT (n))) with v T given in (5) and Here, Ψ lj denotes the second derivative with respect to t. Thus, by (3), the variance is (29) . By Jensen's inequality, the coefficient of log 2 q N is zero if and only if all a j are equal. If all a j are equal, then the coefficient of log q N in (29) simplifies by (25), too, and we obtain (4).
For the computation of the Fourier coefficients and the proof of the Hölder condition, we need an explicit expression for Ψ 1 .
The estimate f l (r) = O(r d−1 log r) holds.
From the combinatorial interpretation of b(n) and g(n)u(t), we obtain (33) ib , in analogy to (13). As the range of summation of G C and B C coincides, we immediately get .
By (26) and by differentiating H ε (N )u(t) using (21), (34) and (13), The fact that w l is a left eigenvector of M and (14)  To formulate T (n) as a q-regular sequence, we first define output vectors. The s-th entry of the vector δ ε is the output label of the transition from state s with input label ε. By (17), (33), and Remark 3.10. We can use the matrices ) in the definition of a q-regular sequence (2) to realize that the output sum of a transducer is q-regular. If d > 1, then this is a multidimensional q-regular sequence (cf. [1]).

Hölder Continuity.
In this section, we prove the continuity of the fluctuations Ψ 1 and Ψ 2 as well as the Hölder continuity of Ψ 1 . This will be used to establish the convergence of the Fourier series.
Lemma 3.11. The functions Ψ 1 (x) and, if all a j are equal, Ψ 2 (x) are continuous for x ∈ R.
Proof. First note that continuity of Ψ 1 for x ∈ R with x = log q y where y has no finite q-ary expansion follows from the definitions (24) and (25). To prove it for x = log q y with 0 ≤ x < p where y has a finite q-ary expansion, observe that the two one-sided limits exist due to the definition. Next, we prove that they are the same. Consider the two integer sequences N k = yq pk andÑ k = N k − 1 for k large enough such that N k is an integer. For a real number z, we write {z} p = p{z/p} for the unique real number in the interval [0, p) such that z − {z} p is an integer multiple of p. This yields (3)) and take the difference, we get Because Ψ 1 (x) is bounded by a geometric series by definition, we have Therefore, Ψ 1 is continuous in x.
The continuity of Ψ 2 (x) at x = log q (y) for y with infinite q-ary expansion again follows from the definition of Ψ 2 . If all a j are equal, the continuity of the fluctuation −Ψ 2 1 + Ψ 2 of the variance (4) follows as above, where log N k has to be replaced by log 2 N k in the error terms. Thus Ψ 2 is also continuous in this case.
Proof. Let 0 < α < 1 be any constant. We want to prove that there exists a positive constant C such that holds for all x, y ∈ R. For x = y, the left-hand side of (37) is 0 and the inequality is obviously satisfied. From now on, assume that x < y. By the periodicity of Ψ 1 , it is sufficient to prove (37) for 0 ≤ x < p.
First, we prove (37) for the case 0 ≤ x < y and sufficiently small y − x < 1.
Fix such x and y and choose the integer k such that Note that the continuous differentiability of z → q z on the compact interval [0, p + 1] implies that q y − q x = O(|y − x|) and therefore We prove (37) in three steps.
Statement 3.13. Let a, b ∈ R with x ≤ a < b ≤ y and a = b such that the first k + 1 digits of the expansions Proof. Lemma 3.9 yields because the summands for m ≤ k cancel in the first sum as the first k + 1 digits coincide. By using the estimates (see Lemma 3.9 for the last estimate), we obtain Here, (38) has been used in the penultimate step.
We now use the continuity of Ψ 1 and Statement 3.13 to remove the condition on coinciding digits from Statement 3.13. Statement 3.14. Let a, b ∈ R with x ≤ a < b ≤ y and a = b . Then Proof. We write the expansions of q {a} and q {b} as This yields . . a k ) q , the result follows immediately from Statement 3.13. Otherwise, we have For m ≥ 0, define z and z m by z = z m = a = b and Then lim m→∞ z m = z because of (39). By construction of z and z m , we have a < z m < z ≤ b for sufficiently large m.
By continuity of Ψ 1 , holds for sufficiently large m. This yields The third summand can be bounded by Statement 3.13 (for a and z m ) and the second by (40). The first summand is either 0 or can be bounded by Statement 3.13 (for z and b).
To finally prove (37) for sufficiently small y − x < 1 , we only have to remove the assumption a = b from Statement 3.14. We use the idea of the proof of Statement 3.14 once more.
Assume that y > x . By our assumption y < x + 1, this amounts to y = x + 1. For m ≥ 0, define z and z m by z = y , z m = x and q {zm} = ((q − 1) (q − 1) m ) q . Then lim m→∞ z m = z. By continuity of Ψ 1 , we have (41) |Ψ 1 (z) − Ψ 1 (z m )| ≤ |y − x| α and x < z m < z ≤ y for sufficiently large m. Then, this yields The third summand can be bounded by Statement 3.14 for x and z m and the second by (41). The first vanishes or can be bounded by Statement 3.14 for z and y. This yields Therefore, (37) is satisfied with a suitable positive constant C for y − x < ε for some ε > 0. Assume y − x ≥ ε. As Ψ 1 is continuous and periodic, |Ψ 1 (y) − Ψ 1 (x)| is bounded. Thus, (37) holds for a suitable positive constant C for |y − x| ≥ ε.

3.5.
Limiting distribution. Finally, we can prove the parts of Theorem 1 concerning the approximation of the distribution function and the central limit theorem.
Proof. To prove that the distribution function can be approximated by a Gaussian mixture, we use the Berry-Esseen inequality (cf., for instance, [12,Theorems IX.5]) to estimate the difference between distribution functions. The proof follows the proof of Hwang's Quasi-Power Theorem [23]. First, we describe the two corresponding characteristic functions. Letĝ N (t) be the characteristic function of a mixture of Gaussian or degenerate distributions with weights λ j , means a j log q N and variances b j for j = 1, . . . , c, that iŝ with a j , b j and λ j defined in (5). By Lemma 3.7, the characteristic functionf N (t) of T (n)/ log q N isf Because of (27) and R(N, 0) = 0 (see Lemma 3.7), we havê Now we use the inequality |e w − 1| ≤ |w|e |w| , valid for all complex numbers w, to obtain holds. This yields Now, the Berry-Esseen inequality with T = c log q N for a small constant c > 0 (cf., for instance, [12,Theorem IX.5]) implies that where F N is the cumulative distribution function of T (n) and G N is the cumulative distribution function of the mixture of Gaussian distributions. If all a j are equal and b j ≥ 0, G N is the distribution function of a mixture of normal (or degenerate) distributions with mean e T log q N and variances b j ≥ 0. After subtracting the mean, (42) converges to 0. Thus, T (n) − E(T (n)) log q N converges in distribution. If all b j > 0, then the same estimates as above yield the speed of convergence.
This completes the proof of Theorem 1.

Fourier Coefficients -Proof of Theorem 2
This section contains the proof of the theorem about the Fourier coefficients. First, we investigate some Dirichlet series which we will use later. Then, we prove the formulas given in Theorem 2. We use the Hölder condition for Ψ 1 to prove that its Fourier series converges.
and, for l = 0, the residue at z = d + 2πil log q is d 2πil . Proof. First, we use the binomial theorem to obtain with L 1 = r≥1 log q r r −z . The Dirichlet series L 1 (z) is holomorphic for z > 1. Thus, the second summand in (43) is holomorphic for z > d − 1. To obtain the expansion of L(z) at z with z = d, we investigate the Dirichlet series L 1 (z) at z = 1.
Let k ≥ 0 be an integer. We use Euler-Maclaurin summation with f (x) = kx −z to obtain where B 1 (x) is the first Bernoulli polynomial. For z > 1, summation over k ≥ 0 yields The second summand and the integral are clearly holomorphic for z > 0. Thus, L 1 (z) can be continued meromorphically to z > 0 with poles coming from the first summand. The expansion around z = 1 is Thus, by (43), we obtain the main part and the residues of L(z) at z = d + 2πil log q for l ∈ Z as stated in the lemma. Lemma 4.2. The Dirichlet series where ζ is the Riemann ζ-function. The result follows from the unique pole of ζ(z) at z = 1 with residue 1.
Next, we investigate the Dirichlet series H C . In particular, we determine its behavior at z = d + χ k and provide an infinite functional equation to compute its residues at these points. This will finally give us the residues of H in (7). We use a similar method as Grabner and Hwang in [16].
For this infinite recursion, define in analogy to the definition of M ε C,D . As before, the s-th entry of δ ε is the output label of the transition starting in s with input label ε. Then, δ = δ ε ∅,∅ holds independently of ε. Furthermore, δ ε C,D = d dt M ε C,D 1 t=0 by (35).
It is analytic for z > d − |C| + 1. For |C| = 1 and k = 0, w k H C has a possible simple pole in z = d + χ k with residue the right-hand side of (49) evaluated at z = d + χ k and divided by log q. For |C| = 1, w 0 H C has a possible double pole with main part Remark 4.6. The infinite recursion (49) can be used to numerically compute the values of H C and its residues at z = d + χ k with arbitrary precision. It numerically converges fast if the first terms of the Dirichlet series H C are computed explicitly.
Proof. As B C (r) = O(r d−|C| log r), the Dirichlet series H C is analytic for z > d − |C| + 1.
To compute the residues of w k H C for |C| = 1 at z = d + χ k , note that q−1 ε=0 M ε C,∅ = M holds independently of C.
We multiply (49) with the left eigenvector w k which results in (51) As |C ∪ D| ≥ 2 or z + m > d, all H C∪D used on right-hand side of (51) are well defined for z > d − 1. The Dirichlet series J have simple poles at z = d for |C| = 1 and D = ∅ (Lemma 4.3). Thus the right-hand side of (51) is meromorphic for z > d − 1 with a simple pole at z = d.
The factor 1−q d−z exp( 2πik p ) has a zero exactly for z = d+χ k , k ∈ Z. Thus for k = 0, w k H C has a possible simple pole at z = d + χ k . Its residue is the right-hand side of (51) evaluated at z = d + χ k divided by log q.
If k = 0, we have z = d. In this case the expansion of the right-hand side of (51) is where we used the expansion of J in Lemma 4.3, δ = q−1 ε=0 δ ε C,∅ and (14).  The residue at z = d + χ k , k = 0 is The main part at z = d is Now we can prove the formulas for the Fourier coefficients.
Proof of Theorem 2. The periodic fluctuation Ψ 1 of the expected value is a p-periodic function. We use the explicit expression of Ψ 1 given in Lemma 3.9.
Due to absolute convergence, the k-th Fourier coefficient of The value of the first integral is given by − e T 2 for k = 0, and [k ≡ 0 mod p] e T χ k log q otherwise. Thus, we focus on the second integral I l,m .

Next, consider the function
We know that f l (r) = O(r d−1 log r). Thus, A(z) is analytic for z > d − 1.
By summation by parts, we can rearrange the series for z > d and obtain a sum of Dirichlet series with coefficients s 1 (r), s 2 (r), s 3 (r) and s 4 (r) respectively. These coefficients are differences of the four summands in f k mod p (r) and f k mod p (r− 1) in (32), respectively, e.g., After some simplifications using r−1 q = r q − [q | r] and log q (r − 1) = log q r − [r is a power of q] (for r ≥ 2), we obtain (54) For z > d, we can split up the summation into the different cases in (54). This yields where we used (45)  As the second order poles of w 0 H and L cancel, the right-hand side of (55) is well defined for the limit z → d + χ k . After computing the limit and simplifying the summation, we obtain (7).
Then Lemma 3.12 and Bernstein's theorem (cf. [35, p. 240]) imply the absolute and uniform convergence of the Fourier series. Now we use Theorem 2 to prove Corollary 2.5.
Proof of Corollary 2.5. The transducer in Figure 7 computes the q-ary sum-of-digits function s q (n) and we can use Theorem 2.
We transform the Dirichlet series in two different ways. This series is absolutely convergent for z > 1. First, we can rearrange the summation of the Dirichlet series D(z) such that the Dirichlet series H(z) = m≥1 s q (m)m −z , defined in (46), appears. We have (56) for z > 1. By partial summation, we obtain Expanding the binomial series yields (57) By (57), we have which is equivalent to for z > 1. The sum on the right-hand side is holomorphic at z = 0 because of (56). By meromorphic continuation, this equation also holds for z = 0. This yields On the other hand, we split up the summation in the definition of D(z) into the q equivalence classes modulo q and we use the recursions 4 s q (qm + ε) = s q (m) + ε for 0 ≤ ε < q. This results in Thus, we obtain 5 For k = 0, we further use the expansion 4 Actually, these recursions are (36). 5 Note that this well-known identity can also be derived from s q (m)−s q (m−1) = 1 − (q − 1)v q (m), where v q (m) is the q-adic valuation of m.

Non-Differentiability -Proof of Theorem 3
In this section, we give the proof of the non-differentiability of Ψ 1 (x). We follow the method presented by Tenenbaum [32], see also Grabner and Thuswaldner [17].
Proof of Theorem 3. Let r = (r m−1 . . . r 0 ) q be the value of the reset sequence (r m−1 . . . r 0 ) leading to state ν.
Assume that Ψ 1 is differentiable at x ∈ [0, 1). Let q x = (ε 0 ε 1 . . .) q be the standard q-ary digit expansion choosing the representation ending on 0 ω in the case of ambiguity. Further, let x k be such that q x k = (ε 0 ε 1 . . . ε k ) q . Thus, we have lim k→∞ x k = x. For f ∈ {0, 1}, the function L f : Z → Z is defined as L f (k) = ck + f with c a positive integer such that c > 1 ξ − 1. Define N k = q x k +k+L f (k) and h(k) = q ck+ c c+1 x k −m−2 . Let y k and z k be such that N k + q ck−m−1 r = q y k +k+L f (k) and N k + q ck−m−1 r + h(k) = q z k +k+L f (k) .
From these definitions, we know that h(k) N k = Θ(q −k ), for k → ∞. Apart from x k , also, y k and z k converge to x and satisfy the following bounds: First, observe that q ck−1 | N k and h(k) < q ck−m−1 . Thus, the digit representations of the three summands in N k + q ck−m−1 r + n are not overlapping at non-zero digits for n < h(k). Since the digit expansion of r is a reset sequence, we have where e ν b(N ) is the output of the transducer when starting in state ν with input N and b(ν) is the final output at state ν.
Thus, we have where only the first summand depends on L f (k) and hence on f .
Taking the difference in (3), there is a second way of computing the sum in (62). Using the periodicity and continuity of Ψ 1 (x) yields (63) Next, we use our assumption that Ψ 1 is differentiable at x to replace the difference by the derivative Now, we insert this into (63), divide by h(k) and obtain Thus, we have the following equality twice, for f ∈ {0, 1}. Subtracting these two from each other yields Since the left-hand side is an integer, but the right-hand side is not for k large enough, this contradicts our assumption that Ψ 1 is differentiable at x.

Recursions -Proof of Theorem 4
In this section, we construct a transducer associated to the sequence defined by the recursion in (10). All inequalities, maxima and minima in this section are considered coordinate-wise.
Define the function A : for 0 ≤ λ < q κ 1 and n ≥ 0. So, if A(n) < ∞, then the recursion (10) can be used for this argument because the argument on the right-hand side is non-negative, i.e., a(n) = a(A(n)) + t n mod q κ . First, we construct a non-deterministic transducer T . A priori, it has an infinite number of states; later, we will prove that only finitely many of them are accessible. We then simplify it to obtain a finite, deterministic, subsequential, complete transducer T .
The set of states of T is The initial state is (0, 0) F ; all states (l, j) F are final states with final output a(l) if l ≥ 0 and final output 0 otherwise 6 . As an abbreviation, we will frequently speak about "a state (l, j)" if we do not want to distinguish between (l, j) F and (l, j) N . We call l the carry and j the level of the state (l, j). A state (l, j) F is called simple, if it is final, l ≥ 0 and j ≤ κ.
There are two types of transitions in T , recursion transitions and storing transitions. Each state is either the origin of one recursion transition or of q d storing transitions.
There is a recursion transition leaving (l, j) if • j ≥ κ and • A(q j n + l) < ∞ for all n ≥ 0 with n = 0.
In that case, we write l = q κ s + λ for a 0 ≤ λ < q κ 1 and the transition leads to the state (l , j ) N with j = κ λ + j − κ and l = q κ λ s + r λ . The input label is empty, the output label is t λ . Thus (64) A(q j n + l) = q j n + l for n ≥ 0 with n = 0. Note that (64) holds for n = 0 if and only if l ≥ 0 and l ≥ 0.
We now define the classes F 1 , . . . , F K announced in Section 2.6. For each accessible cycle in T with simple states and input 0, the carries of its states form one of these classes. The other classes are the singletons of those carries l ≥ 0 in the accessible part of T with A(l) = ∞. These sets will turn out to be disjoint by Lemma 6.6 and the finiteness of K will follow from the finiteness of the accessible part of T (Lemma 6.4).
Remark 6.1. We also give a combinatorial description of those classes F 1 , . . . , F K which do not come from cycles in T : Let l ≥ 0 be a carry of an accessible state of T . Then A(l) = ∞ if and only if there is a recursion transition from some (l, j) to some (l , j ) with l ≥ 0.
Proof. Let (l, j 0 ) be any accessible state with carry l. We use the longest path with input 0 using storing transitions only to arrive in some state (l, j)-again, finiteness of this process will follow from the finiteness of the accessible part and the fact that the levels increase along storing transitions. As there is no storing transition leaving (l, j) by construction, there is a recursion transition from (l, j) to some (l , j ). By the remark following (64), l = A(l) or l ≥ 0.
As usual, if reaching a state which is the origin of a transition with empty input, the process may stay in that state or may continue to the destination state writing the output of the transition without reading an input. This is the reason why the transducer is non-deterministic.
Note that in our case, transitions with empty input (i.e., recursion transitions) lead to non-final states and transitions with non-empty input (i.e., storing transitions) lead to final states. Combined with the fact that each state is either the origin of one recursion transition or of q d storing transitions, processing an input is in fact deterministic: For every admissible input-we do not allow leading zeros-, there exists exactly one path leading from the initial state to a final state with the given input. This will enable us to simplify the transducer T to a deterministic transducer T later on.
We need the property that the carries of accessible states are not "too negative": (1) If (l, j) is an accessible state, then Proof. The first assertion is easily shown by induction and (64). The second assertion follows by induction and from the assumption that r λ ≥ 0 holds for all λ. To prove the third assertion, we use (65) on the originating state of the transition.
The last assertion is shown by induction. It is clearly valid in the initial state. For storing transitions, the value of l is non-decreasing. If there is a recursion transition from some (l, j) to some (l , j ) N , we have As leading zeros are not allowed, the last transition in the computation path of any valid input has input ε = 0 and thus leads to a state with a non-negative carry.
For our further investigations and finally the correctness proof, we need a suitable invariant: Lemma 6.3. Consider a path from (l, j) to (l , j ) with input label ε m−1 . . . ε 0 , output label δ m −1 . . . δ 0 using L recursion transitions and n ≥ 0. Thus m is the number of transitions and m = m − L is the number of storing transitions.
If n = 0 or if the last transition is a storing transition with non-zero input ε m−1 , then and, if the recursion (10) is well-posed, Proof. First consider the case that the path consists of a single transition. If it is a storing transition, then L = 0, m = 1, and all assertions follow from the definition and Lemma 6.2. On the other hand, if the transition is a recursion transition, we have L = 1, m = 0, and all assertions again follow from the definition, Lemma 6.2 and (64).
By induction on the length of the path, we obtain (66) and (67).
We are now able to prove the finiteness of the accessible part.
Lemma 6.4. The transducer has a finite number of accessible states.
Proof. For a recursion transition from (l, j) to (l , j ) N , we have j > j . Thus, there are no infinite paths consisting only of recursion transitions. In particular, there exist no cycles of recursion transitions. For d = 1, let J ≥ κ be minimal such that q J−κ ≥ − l min q κ − min λ q −κ λ r λ . Then A(q j + l) < ∞ holds for all accessible states (l, j) with j ≥ J. This implies j ≤ J for all accessible states (l, j). For d ≥ 2, we have j ≤ κ =: J for all accessible states (l, j). Thus there are at most J consecutive recursion transitions.
To prove that only finitely many states are accessible, we introduce the notion of heights of states: The height of a state (l, j) is defined to be h = lq −j . If there exists a storing transition from (l, j) of height h to (l , j ) F of height h , we have 1 Assume that there is a path from (l, j) of height h to (l , j ) of height h with L ≤ J recursion transitions and one storing transition (in this order). Then we have We can subdivide every path in the transducer starting with the initial state into a sequence of such paths and a final path consisting of only recursion transitions. Let h m be the sequence of heights of the states where the subpaths starts. Then, we have Iteration leads to J(s − − 1) q − 1 ≤ h m ≤ Js + + q1 q − 1 for all m. Therefore, the height h of an accessible state is bounded. Since 0 ≤ j ≤ J is also bounded, the integer carry l = q j h of an accessible state (l, j) can only take finitely many different values. The accessible part of the transducer is thus finite. Lemma 6.5. Let P be an infinite path with input zero starting at some state of level j such that all of its states have non-negative carries. Then, after at most j transitions, it reaches a state (l 0 , κ). From that point on, it only passes through simple states, namely (l 0 , κ), (l 1 , j 1 ) N , (l 1 , j 1 + 1) F , . . . , (l 1 , κ) F , (l 2 , j 2 ) N , (l 2 , j 2 + 1) F , . . . , (l 2 , κ) F , (l 3 , j 3 ) N , (l 3 , j 3 + 1) F , . . . , (l 3 , κ) F , . . .
where l i = A(l i−1 ) and j i = κ l i−1 mod q κ for i ≥ 1.
Proof. Denote the first state of P by (l, j).
First, assume that j ≥ κ. As storing transitions always increase the level and the levels are bounded by Lemma 6.4, the path has to contain at least one recursion transition. Thus the path starts with k ≥ 0 storing transitions leading from (l, j) to (l, j + k), followed by a recursion transition from (l, j + k) to (l , j ). By assumption, we have l ≥ 0 and l ≥ 0. Thus A(l) = l = ∞ by (64). Therefore, there is a recursion transition leaving (l, j), i.e., there were no leading storing transitions. Recall that j < j holds for any recursion transition. We repeat the argument at most j − κ times until we reach a simple state.
If we are in a simple state (l , j ) with j < κ, the next κ − j steps will be storing transitions, leading to (l , κ). This means that after at most j steps, we reach a state (l 0 , κ).
We now apply the argument of the second paragraph again. Thus a recursion transition leads to (l 1 , j 1 ) with l 1 = A(l 0 ) and j 1 = κ l 0 mod q κ .
The remainder of the lemma follows by induction.
As an auxiliary structure for deciding the well-posedness of the recursion, we introduce the recursion digraph R. It has set of vertices N d 0 and arcs (n, A(n)) with label t n mod q κ for all n ∈ N d 0 with A(n) < ∞. Thus a(n) can be computed from the successor of n in R using the recursion (10). By definition, each vertex of R has out-degree 1 or 0. Each component of R is a functional digraph or a rooted tree (oriented towards the root). If we have q κ n ∞ − λ ∞ > q κ λ n ∞ + r λ ∞ and therefore q κ n + λ ∞ > q κ λ + r λ ∞ for all 0 ≤ λ < q κ 1. Thus we have n ∞ < n ∞ for all but finitely many arcs (n, n ) of R.
Thus for every vertex of R, there is a unique path starting in this vertex and leading to a vertex with out-degree 0 or a finite cycle.
From this description, it is clear that the recursion is well-posed if and only if • the sum of the labels of each cycle in R is 0 and • the set I consists of one element for every cycle in R as well as of the vertices with out-degree 0 in R.
We now prove the essential connection between the recursive digraph and the transducer T . This also implies that the classes F 1 , . . . , F K are disjoint. Lemma 6.6. There exists a bijection between cycles in the recursive digraph R and accessible cycles in the transducer T with input 0 and simple states. Corresponding cycles under this bijection have the same output sum and sum of labels.
Proof. Let n 0 , . . . , n L = n 0 be a cycle in the recursive digraph with n R ≥ 0 for all 0 ≤ R < L.
Let k 0 be the length of the path P 0 in T starting in the initial state and reading the q-ary expansion of n 0 .
We determine the destinations of certain paths in the transducer associated with the cycle in the recursive digraph. Statement 6.7. Let k ≥ k 0 and P be the path from the initial state (0, 0) to (l, j) of length k whose input label is the q-ary expansion of n 0 , padded with leading zeros. Assume that the number of recursion transitions in this path is LQ + R for some Q ≥ 0 and 0 ≤ R < L. Then l = n R ≥ 0.
Proof of Statement 6.7. Let k = k−(LQ+R) be the number of storing transitions of P. By (66), we have (68) A LQ+R (q k n + n 0 ) = q j n + l for n ≥ 0, n = 0. Note that for M ≥ κ and n ≡ n (mod q M ) with A(n) < ∞ and A(n ) < ∞, the definition of A implies A(n) ≡ A(n ) (mod q M −κ ).
Together with the definitions of n R and the recursive digraph R as well as (68), this implies n R = A LQ+R (n 0 ) ≡ A LQ+R (q k +M 1 + n 0 ) = q j+M 1 + l (mod q k +M −(LQ+R)κ ) for sufficiently large M . Coarsening yields n R ≡ l (mod q M −(LQ+R)κ ), still valid for sufficiently large M . As l is bounded by Lemma 6.4, this implies n R = l. Now, we conclude the proof of Lemma 6.6. Let P be the infinite path in T starting at the destination of P 0 and reading zeros. By Lemma 6.5 applied to P together with Statement 6.7 applied to P 0 concatenated with prefixes of P, P leads to a cycle in T . Its states are simple and have carries n 0 , . . . , n L−1 and levels determined by n 0 , . . . , n L−1 as in Lemma 6.5.
This construction defines a map from the cycles of the recursive digraph R to the accessible cycles with input 0 in the transducer with simple states. This map is injective by construction. Under this map, the sum of the labels of the cycle in R equals the sum of output labels of the cycle in T by construction.
To use Theorem 1, we simplify T to obtain the deterministic transducer T , that is one without transitions with empty input. As a first step, we remove all non-accessible states. By Lemma 6.4, this leaves us with finitely many states.
By Lemma 6.4 and the fact that recursion transitions decrease the level, the length of paths consisting of recursion transitions only is bounded. As a recursion transition always leads to a non-final state, processing an input never ends with a recursion transition.
Consider a recursion transition from (l, j) to (l , j ) N with output t such that no recursion transition originates in (l , j ) N . For each transition originating in (l , j ) N , say to some (l , j ) F with input ε and output t , we insert a storing transition from (l, j) to (l , j ) F with input ε and output t + t . Then, the recursion transition from (l, j) to (l , j ) N is removed. The number of recursion transitions decreased by one and the new transducer generates the same output as the old transducer. We repeat this process until there are no more recursion transitions. Then, all non-final states are inaccessible and are removed.
Proof of Theorem 4. By Lemma 6.6 and the characterization of wellposedness via the recursive digraph, the recursion (10) is well-posed if and only if I consists of exactly one representative of each of the sets F j , 1 ≤ j ≤ K, and if T has no cycle with simple states, input 0 and non-vanishing output sum.
We now show that the cycles of simple states with input 0 in T are exactly the reductions of the cycles of simple states with input 0 in T . As a cycle with simple states and input 0 in T does not have consecutive recursion transitions (cf. Lemma 6.5), it is reduced to a cycle with simple states in T . On the other hand, consider a cycle of T with input 0 containing a non-simple state. If there is a state of level > κ, the state with largest level is final and is not removed. If all states have level ≤ κ, then there are no two consecutive recursion transitions, so no negative carry is completely removed from the cycle in the reduction to T . Therefore, such a cycle is not reduced to a cycle with simple states and input 0 in T . Therefore, the assertion on well-posedness is proved.
To prove correctness of the transducer, we use (66) with (l, j) = (0, 0), the joint q-ary expansion of n as input leading to some state (l , j ) F with output δ m −1 . . . δ 0 . By Lemma 6.2, we have l ≥ 0 because the last transition is a storing transition with non-zero input. Thus by (67), a(n) = a(l ) + m −1 k=0 δ k . As the final output of (l , j ) F is defined to be a(l ), we obtain T (n) = a(l ) + m −1 k=0 δ k = a(n), as requested.