Quantifying Noninvertibility in Discrete Dynamical Systems

Given a finite set $X$ and a function $f:X\to X$, we define the degree of noninvertibility of $f$ to be $\displaystyle\text{deg}(f)=\frac{1}{|X|}\sum_{x\in X}|f^{-1}(f(x))|$. This is a natural measure of how far the function $f$ is from being bijective. We compute the degrees of noninvertibility of some specific discrete dynamical systems, including the Carolina solitaire map, iterates of the bubble sort map acting on permutations, bubble sort acting on multiset permutations, and a map that we call ``nibble sort.'' We also obtain estimates for the degrees of noninvertibility of West's stack-sorting map and the Bulgarian solitaire map. We then turn our attention to arbitrary functions and their iterates. In order to compare the degree of noninvertibility of an arbitrary function $f:X\to X$ with that of its iterate $f^k$, we prove that \[\max_{\substack{f:X\to X\\ |X|=n}}\frac{\text{deg}(f^k)}{\text{deg}(f)^\gamma}=\Theta(n^{1-1/2^{k-1}})\] for every real number $\gamma\geq 2-1/2^{k-1}$. We end with several conjectures and open problems.


Introduction
Functions between finite sets play a fundamental role in classical enumerative and algebraic combinatorics, as they are often used to transfer combinatorial information from one set of objects to another. The goal of the recently-introduced field of dynamical algebraic combinatorics is to study the dynamics of functions f : X → X, where X is a finite set (usually of combinatorial interest). In this case, we say f is a discrete dynamical system. In both of these situations, it is very common to consider functions that are injective or even bijective. However, the world is not always so simple; some of the most intriguing combinatorial functions are not 1-to-1. The purpose of this article is to introduce and explore a natural way of measuring how far a map is from being injective. We state the definition for arbitrary maps between finite sets, but we will focus our attention in most of the article on discrete dynamical systems. In this case, we are actually measuring how far the function is from being bijective. this is the formulation that we will use in the remainder of the article. For brevity, we will often simply call deg(f ) the "degree" of f . For a function f : X → Y , we necessarily have 1 ≤ deg(f ) ≤ |X|. The lower bound is attained if and only if f is injective, and the upper bound is attained if and only if f is a constant function. More generally, the degree of a k-to-1 map is precisely k. Notice that |X| deg(f ) is the number of pairs (x, x ) ∈ X × X such that f (x) = f (x ). Equivalently, if x is chosen randomly from the uniform distribution on X, then deg(f ) is the expected number of elements x ∈ X such that f (x ) = f (x). If we define a probability distribution ν on X by ν(x) = |f −1 (x)|/|X|, then log(|X|/ deg(f )) is the Rényi entropy of ν of order 2. The quantity deg(f ) − 1 was also termed the "coefficient of coalescence" of f in [1].
In Section 2, we analyze the degrees of some specific families of discrete dynamical systems of combinatorial interest. The first types of systems are "sorting maps" that act on permutations of length n; these are the bubble sort map, the (West) stack-sorting map, and a map that we call "nibble sort" (which swaps the first pair of adjacent entries in a nonidentity permutation that are in decreasing order). We also consider analogues of bubble sort and nibble sort acting on words. The last two types of systems act on partitions of a fixed positive integer n; these are the Bulgarian solitaire map and its close relative, the Carolina solitaire map. Setting k = γ = 2 shows that the ratio deg(f 2 )/ deg(f ) 2 can be as big as Θ( √ n) (but no bigger).
We collect several open problems and conjectures in Section 4.
Before we proceed, let us mention the following more refined estimates for the degree, which will be useful in the remainder of the article. Lemma 1.2. If X and Y are finite sets and f : X → Y is a function, then Proof. The Cauchy-Schwarz inequality tells us that For the upper bound, let M = max y∈Y |f −1 (y)|. We have

Specific Discrete Dynamical Systems
In this section, we consider specific families of discrete dynamical systems that are indexed by positive integers. In order to define these maps, we recall the following standard definitions.
A permutation is an ordering of a finite set of positive integers. We view permutations as words. Let S n be the set of permutations of the set [n] := {1, . . . , n}. The normalization of a permutation π = π 1 · · · π n is the permutation in S n obtained by replacing the i th -smallest entry in π with i for all i ∈ [n]. A descent of a permutation π = π 1 · · · π n is an index i ∈ [n − 1] such that π i > π i+1 . Let Des(π) be the set of descents of π. Given a tuple of positive integers a = (a 1 , . . . , a r ), we define W a to be the set of all words over the alphabet [r] that contain exactly a i copies of the letter i for all i. A composition of a positive integer n is a tuple c = (c 1 , . . . , c ) of positive integers that sum to n. The entries c 1 , . . . , c are called the parts of c. A partition of a positive integer n is a composition of n whose parts appear in nonincreasing order. Let Comp(n) and Part(n) denote, respectively, the set of compositions of n and the set of partitions of n.

Bubble Sort and Its
Iterates. Suppose π is a permutation of length n and i ∈ [n − 1]. If i ∈ Des(π), let t i (π) be the permutation obtained from π by swapping the i th and (i + 1) st entries in π. If i ∈ Des(π), let t i (π) = π. The operators t 1 , . . . , t n−1 play a crucial role in algebraic combinatorics because they generate the 0-Hecke algebra of the symmetric group S n [12].
Let B(π) = t n−1 • t n−2 • · · · • t 1 (π). The function B is called the bubble sort map. An alternative recursive description of B is as follows. First, B sends the empty permutation to itself. If π is a permutation with largest entry m, then we can write π = LmR for some permutations L and R. Then B(π) = B(L)Rm. For example, By a slight abuse of notation, we will use the same letter B to denote the restriction of B to S n , which we can view as a discrete dynamical system on S n . We refer the interested reader to [4] and [16, pages 106-110] for additional information and interesting properties of the bubble sort map. For instance, Knuth reports that the number of permutations of length n that are completely sorted after k iterations of B is n! for n ≤ k and k n−k k! for n ≥ k; from our point of view, this is a formula for the cardinality of the k-fold preimage of the identity permutation.
For every fixed positive integer k, we will give an exact formula for the degree of the iterate B k : S n → S n . We first need the following preliminary definitions. Given a permutation π = π 1 · · · π n ∈ S n , let e j (π) be the number of entries of π that appear to the left of j and are bigger than j. The tuple (e 1 (π), . . . , e n (π)) is called the inversion table of π. For example, the inversion tables of 416352 and 143526 are (1, 4, 2, 0, 1, 0) and (0, 3, 1, 0, 0, 0), respectively. Let I n denote the set of tuples (e 1 , . . . , e n ) such that 0 ≤ e i ≤ n − i for all i. It is not difficult to show that the map π → (e 1 (π), . . . , e n (π)) is a bijection from S n to I n . A left-to-right maximum of π is an entry j such that e j (π) = 0. Let lmax(π) denote the number of left-to-right maxima of π. The tail length of π, denoted tl(π), is the largest integer ∈ {0, . . . , n} such that π i = i for all i ∈ {n − + 1, . . . , n}.
Proof. It is known (and straightforward to prove using the recursive description mentioned above) that bubble sort has the effect of decreasing each nonzero entry in the inversion table of a permutation by 1. In other words, e j (B(σ)) = max{0, e j (σ) − 1} for all j. This implies that e (B k (σ)) = 0 for each ∈ {n − k + 1, . . . , n} and e (B k (σ)) ≤ n − k − for each ∈ {1, . . . , n − k}. These two conditions are equivalent to the statement that tl(B k (σ)) ≥ k (indeed, the second condition says that (e 1 (σ), . . . , e n−k (σ)) is the inversion table of a permutation in S n−k ). Thus, permutations with tail lengths less than k have 0 preimages under B k . Now suppose tl(π) ≥ k. Choosing σ ∈ B −k (π) is equivalent to choosing the inversion sequence of σ. To do this, we must first increase each of the nonzero entries in (e 1 (π), . . . , e n−k (π)) by k. Next, we must increase each of the lmax(π) − k entries in (e 1 (π), . . . , e n−k (π)) that are equal to 0 by some integer in {0, . . . , k}. There are (k + 1) lmax(π)−k ways to do this. Finally, for each ∈ {n − k + 1, . . . , n}, we must increase the entry e (π) (which is 0) by some integer in {0, . . . , n− }. The total number of ways to do this for all ∈ {n−k +1, . . . , n} is k!.
In the next theorem, we let deg n (B k ) denote the degree of the map B k : S n → S n .
Theorem 2.2. For every positive integer k, we have Proof. A well-known result due to Rényi [19] states that the number of elements of S n with left-to-right maxima is given by the unsigned Stirling number of the first kind n . These numbers have the generating function n =0 n x = x(x + 1)(x + 2) · · · (x + n − 1).
Given π ∈ S n with tl(π) ≥ k, we can write π = τ (n − k + 1)(n − k + 2) · · · n for some τ ∈ S n−k with lmax(τ ) = lmax(π) − k. Invoking Lemma 2.1, we find that When k = 1, Theorem 2.2 tells us that the degree of B : S n → S n is (n + 1)(n + 2) 6 . There is an alternative way of generalizing this result that makes use of the probabilistic interpretation of the degree mentioned in the introduction. Namely, the bubble sort map is associated with a random variable U n on S n defined by U n (π) = |B −1 (B(π))|. The degree deg n (B) is just the expected value of U n . The following theorem computes all of the moments of this random variable. Theorem 2.3. For m ≥ 1, the m th moment of U n (with respect to the uniform distribution on S n ) is given by and V j (t) = 1 otherwise. As mentioned in the proof of Lemma 2.1, the bubble sort map has the effect of decreasing by 1 each of the positive entries in the inversion table of a permutation. It follows that for π ∈ S n , we have |B −1 (B(π))| = n−1 j=1 V j (e j (π)). Choosing π ∈ S n uniformly at random is equivalent to choosing the entries e j = e j (π) ∈ {0, . . . , n − j} (for j ∈ [n − 1]) independently and uniformly at random (note that e n (π) is always 0). Therefore, the expected value of U n (π) m is the same as the expected value of n−1 so the desired result follows from the fact that e 1 , . . . , e n−1 are chosen independently.

2.2.
Bubble Sort for Words. Throughout this subsection, fix a tuple a = (a 1 , . . . , a r ) of positive integers, where r ≥ 2. We are interested in the obvious analogue of the bubble sort map acting on W a . Recall that this is the set of words with exactly a i copies of the letter i for all i (these are also called permutations of the multiset {1 a 1 , . . . , r ar }). Given a word w = w 1 · · · w over the alphabet of positive integers and i ∈ [ − 1] with w i > w i+1 , let t i (w) be the word obtained by swapping the positions of w i and w i+1 in w.
We can consider this generalization of bubble sort as a discrete dynamical system on W a . Denote by deg a (B) the degree of this map.
Theorem 2.4. The degree of B : W a → W a is given by 2 a j a j+1 + a j+2 + · · · + a r + 1 + 1 .
Proof. The proof is by induction on the length r of the tuple a = (a 1 , . . . , a r ). Let us first assume r = 2. Every word in W (a 1 ,a 2 ) can be written in the form is a tuple of nonnegative integers that sum to a 1 . In fact, this establishes a bijection between W (a 1 ,a 2 ) and the set of (a 2 + 1)-tuples of nonnegative integers that sum to a 1 . Bubble sort transforms the word corresponding to the tuple (γ 0 , γ 1 , . . . , γ a 2 ) to the word corresponding to the tuple (γ 0 + γ 1 , γ 2 , . . . , γ a 2 , 0). The number of preimages under B of the word corresponding to (γ 0 + γ 1 , γ 2 , . . . , γ a 2 , 0) is γ 0 + γ 1 + 1 (since this is the number of ways to write γ 0 + γ 1 as a sum of two nonnegative integers). Now let γ i be the average value of γ i over W (a 1 ,a 2 ) . By symmetry, γ i is is the average value of γ 1 + γ 2 + 1 over W (a 1 ,a 2 ) , is 2 a 1 a 2 + 1 + 1.
We now assume a = (a 1 , . . . , a r ), where r ≥ 3. Let a = (a 1 , . . . , a r−2 , a r−1 + a r ) and a = (a r−1 , a r ). There is a natural projection ψ : W a → W a obtained by replacing each occurrence of the letter r in a word with the letter r − 1. We also have a map ϕ : W a → W a that decreases each letter in a word by r − 2 and then deletes all of the nonpositive letters. For example, with a = (2, 1, 2, 3), we have ψ(14234413) = 13233313 and ϕ(14234413) = 21221.
It is straightforward to check from these definitions that ψ and ϕ commute with the action of bubble sort. That is, (ψ(B(w)), ϕ(B(w))) = (B(ψ(w)), B(ϕ(w))). Moreover, the map W a → W a × W a given by w → (ψ(w), ϕ(w)) is a bijection. This means that if (ψ(w), ϕ(w)) = (w , w ), then We know from the r = 2 case that deg a (B) = 2 a r−1 a r + 1 +1, so the desired result follows by induction on r.
2.3. The Stack-Sorting Map. The stack-sorting map was originally defined in West's Ph.D. dissertation [21] as a deterministic variant of a "stack-sorting algorithm" introduced in Knuth's book The Art of Computer Programming [15]. It has a recursive definition very similar to the that of the bubble sort map. First, s sends the empty permutation to itself. If π is a permutation with largest entry m, then we can write π = LmR for some permutations L and R. Then For each positive integer n, we can view s as a discrete dynamical system on S n . We refer the reader to [2,3,6,8] and the references therein for more information about this map.
Let deg n (s) denote the degree of the map s : S n → S n . In [6][7][8], the first author found methods for computing the number of preimages of an arbitrary permutation under the stack-sorting map. Unfortunately, it seems quite difficult to use these methods in order to find an explicit formula for deg n (s). However, we will still be able to show that deg n (s) grows exponentially in n. This is in stark contrast to Theorem 2.2, which shows that for each fixed k, the degree of B k : S n → S n grows polynomially in n. Roughly speaking, this says that the stack-sorting map is much further from being invertible than any iterate of the bubble sort map.
Theorem 2.5. The limit lim n→∞ deg n (s) 1/n exists and satisfies Proof. Note that m!d m (s) is the number of pairs (π, π ) ∈ S m × S m such that s(π) = s(π ). Suppose π, π ∈ S m−1 and σ, σ ∈ S n−1 are such that s(π) = s(π ) and s(σ) = s(σ ). Let A be an (m − 1)element subset of {1, . . . , m + n − 2}. Let π and π be the permutations of A whose normalizations are π and π , respectively. Let σ and σ be the permutations of {1, . . . , m + n − 2} \ A whose normalizations are σ and σ , respectively. We have s( π) = s( π ) and s( σ) = s( σ ), so This shows that π(m+n−1) σ and π (m+n−1) σ are two elements of S m+n−1 with the same image under s. The map sending the tuple (π, π , σ, σ , A) to the pair ( π(m + n − 1) σ, π (m + n − 1) σ ) is injective, so Rearranging, this shows that We will make use of a generalization of Fekete's lemma due to de Bruijn and Erdős [5], which states that if a sequence of positive real numbers (a m ) m≥1 satisfies a m a n ≤ a m+n whenever 1/2 ≤ n/m ≤ 2, then lim n→∞ a 1/n n exists and equals sup n≥1 a 1/n n . Now let a n = d n−1 (s) n 2 for n ≥ 8 and a n = 0 for 1 ≤ n ≤ 7. It is not difficult to check that m 2 n 2 ≥ (m + n) 2 (m + n − 1) whenever m, n ≥ 8 and 1/2 ≤ n/m ≤ 2. Therefore, it follows from (1) that whenever m, n ≥ 8 and 1/2 ≤ n/m ≤ 2. The inequality a m a n ≤ a m+n also certainly holds whenever m or n is at most 7. According to the aforementioned generalization of Fekete's lemma, lim To prove the upper bound, we use the fact that |s −1 (π)| ≤ C n for all π ∈ S n , where C n = 1 n + 1

Nibble Sort.
Recall the definition of the maps t i : S n → S n from Section 2.1. In this section, we consider the nibble sort map nib : S n → S n defined by nib(π) = t min(Des(π)) (π) if π = 123 · · · n and nib(123 · · · n) = 123 · · · n. In other words, if π ∈ S n \ {123 · · · n} has initial descent i, then nib(π) is the permutation obtained from π by swapping the entries π i and π i+1 . Let deg n (nib) be the degree of nib : S n → S n .
In the previous two subsections, we found that deg n (B) grows quadratically in n while deg n (s) grows exponentially in n. Since the map nib does not change its input very much (it just nibbles a little bit), one might expect deg n (nib) to grow much slower than deg n (B). In fact, deg n (nib) approaches a constant as n → ∞.
Now fix k ≤ n − 2. We are going to count permutations π ∈ S n such that min(Des(π)) = k and π k < π k+2 . This is equivalent to counting permutations whose first k + 2 entries have normalization 123 · · · (j − 1)(j + 1) · · · (k + 1)j(k + 2) for some j ∈ [k]. For each fixed j, the probability that the first k + 2 entries of a permutation chosen uniformly at random from S n have this normalization is 1 (k + 2)! . Since there are k choices for j, the probability that the first k + 2 entries of a random permutation have one of these normalizations is k (k + 2)! . Therefore, the number of such permutations is n!k (k + 2)!
A probabilistic argument similar to the one used in the previous paragraph shows that the number of permutations in S n with smallest descent k is n!k (k + 1)! (this is also known: see OEIS sequence A092582 [18]). Therefore, the number of permutations π ∈ S n satisfying min(Des(π)) = k and For each such π, nib −1 (π) is the set of permutations obtained from π by swapping the entries π i and π i+1 for some i ∈ [k − 1]; thus, | nib −1 (π)| = k − 1.
2.5. Nibble Sort for Binary Words. There is a natural analogue of the nibble sort map (which we also denote by nib) that acts on binary strings. Namely, if w ∈ {0, 1} n , then nib(w) is obtained from w by replacing the first occurrence of the factor 10 in w with 01. If no such factor exists, then nib(w) = w. For example, nib(001110) = nib(010101) = 001101 and nib(00011) = 00011. In what follows, we let 0 α denote the word 00 · · · 0 consisting of α copies of the letter 0. The word 1 α is defined similarly.
Next, assume w ∈ nib −1 (w). Each preimage of w is obtained by changing an occurrence of the factor 01 in w to 10. Thus, w must have at least two occurrences of 01, and must therefore have the form w = x01y01z for some words x, y, z such that nib(x10y01z) = nib(x01y10z) = w. Because nib(x01y10z) = w, the word x01y does not contain an occurrence of the factor 10; hence, x = 0 α−1 and y = 1 β−1 for some α, β ≥ 1. This means that w = 0 α−1 011 β−1 01z = 0 α 1 β 01z. Note that there is no way to obtain an element of nib −1 (w) from w by changing an occurrence of 01 in z to 10. It follows that nib −1 (w) = {0 α−1 101 β−1 01z, 0 α 1 β 10z}, and the claim is satisfied once again.
We have seen that every element of {0, 1} n has at most 2 preimages under nib, so we can write We obtain a map A 2 → Z by sending each w ∈ A 2 to the unique element of nib −1 (w) ∩ Z . This map is clearly injective; we claim that it is also surjective. Proving this claim amounts to showing that nib(u) ∈ A 2 for every u ∈ Z . If u is of the form 0 γ 1 δ for some γ, δ ≥ 1, then nib(u) = nib(0 γ−1 101 δ−1 ) = u, so nib(u) ∈ A 2 . If u is not of this form, then it must be of the form 0 γ 1 δ 0x for some γ ≥ 1, some δ ≥ 2, and some word x. In this case, we have nib(u) = 0 γ 1 δ−1 01x = nib(0 γ−1 101 δ−2 01x), so nib(u) ∈ A 2 in this case as well. This proves the surjectivity, so we now know that |A 2 | = |Z |.
Carolina solitaire, a variant of Bulgarian solitaire introduced in [11] and studied further in [14,20], is a map that sends compositions of n to compositions of n. Given a composition c = (c 1 , . . . , c ) of n, we define C (c) to be the composition obtained by deleting all of the 0's from the tuple ( , c 1 − 1, . . . , c − 1). For example, C (3, 1, 3, 7, 1, 8) = (6, 2, 2, 6, 7). Let deg n (B) and deg n (C ) denote the degrees of B : Part(n) → Part(n) and C : Comp(n) → Comp(n), respectively. It appears that computing deg n (B) exactly, or even asymptotically, is quite difficult. By contrast, we will compute deg n (C ) exactly.
Theorem 2.8. We have lim inf n→∞ deg n (B) ≥ 2. For every positive integer n, we have Proof. Consider a partition λ = (λ 1 , . . . , λ ) of n. It is straightforward to check that |B −1 (λ)| is the number of distinct elements of the list λ 1 , . . . , λ that are at least − 1. The rank of a partition λ = (λ 1 , . . . , λ ) is defined to be λ 1 − . Thus, the image B(Part(n)) consists of the partitions of n with rank at least −1 (these were enumerated in [14]). Considering the involution of Part(n) that sends a partition to its conjugate, we see that for each integer r, the number of partitions of n with rank r is equal to the number of partitions of n with rank −r. In addition, the main result in [10] shows that the number of partitions of n with rank −1 or 0 is o(| Part(n)|). It follows that |B(Part(n))| ∼ 1 2 | Part(n)| as n → ∞. The first statement in the theorem now follows from Lemma 1.2.
To prove the desired upper bound, let us fix a positive integer u. Let N be the smallest positive integer such that there exists a partition λ = (λ 1 , . . . , λ ) of N with |B −1 (λ)| ≥ u. We now invoke the description of |B −1 (λ)| mentioned in the previous paragraph. If any of the parts of λ are at most − 2, then we can delete these parts to obtain a new partition λ of an integer N with |B −1 ( λ)| ≥ u and N < N . This contradicts the minimality of N , so we must have λ ≥ − 1. We may now assume that = u and that all of the parts of λ are distinct. Indeed, if this were not the case, then we could again delete a part of λ to obtain a new partition of an integer smaller than N with at least u preimages under B. It follows that λ i ≥ λ + − i ≥ 2 − i − 1 for all i.
. This shows that for every positive integer n, max λ∈Part(n) |B −1 (λ)| ≤ w n , where w n is the largest integer such that 3 w n (w n − 1) 2 ≤ n. It is straightforward to verify that w n = At the moment, we do not see a way to improve the estimates for deg n (B) in the previous theorem. However, we computed |B −1 (B(λ))| for 100 random partitions of 1000 and also for 100 random partitions of 100000. In the first case, the data had a mean of 2.95 and a standard deviation of 0.22. In the second case, the mean and standard deviation were 2.85 and 0.17. This data hints that the asymptotics of these degrees might be remarkably simple.  We now turn our attention to Carolina solitaire. Define a sequence of integers (η n ) n≥0 by the generating function equation This is sequence A217661 in [18].
Theorem 2.10. Preserving the above notation, we have deg n (C ) = η n 2 n−1 for all n ≥ 1. Thus, Thus, It is known (see the comments in the OEIS entry A217661 [18]) that so this proves the first statement of the theorem. The asymptotic formula for deg n (C ) follows from the asymptotic formula for η n , which appears in the OEIS entry A217661.

Iterates of a Function
Now that we have examined the degrees of several specific discrete dynamical systems, we will shift our focus to a problem with a more extremal flavor. Let us fix an integer k ≥ 2. In this section, we compare the degree of an arbitrary function f : X → X with the degree of its iterate f k : X → X. Here, X is a finite set of size n. We first observe that for every real γ ≥ 0. Indeed, the upper bound follows from the fact that deg(f k ) ≤ n and deg(f ) ≥ 1. To prove the lower bound, notice that if x, x ∈ X satisfy f (x) = f (x ), then f k (x) = f k (x ). This proves that deg(f k ) ≥ deg(f ), so Furthermore, the lower bound is attained by a constant function. The main result we will prove below is that if γ ≥ 2 − 1/2 k−1 is fixed, then the upper bound can be replaced by O(n 1−1/2 k−1 ). In addition, we will construct an example showing that this upper bound is tight. Hence, for every integer k ≥ 2 and real number γ ≥ 2 − 1/2 k−1 , we have determined how large the ratio deg(f k ) deg(f ) γ can be, up to a constant factor. To ease notation in what follows, let Theorem 3.1. Let k ≥ 2 be an integer, and let γ ≥ 0 be a real number. As n → ∞, we have Remark 3.2. One special case of Theorem 3.1 tells us how badly the degree of noninvertibility can fail to satisfy the submultiplicativity inequality deg(f • g) ≤ deg(f ) deg(g) in the specific case in which f = g. Indeed, if we set k = γ = 2, then the theorem tells us that The first part of Theorem 3.1 will follows from the next proposition. Proof. Let T (m 0 ) be the rooted tree that is isomorphic to a path with m 0 edges (so each nonleaf vertex has exactly one child). Let T (m 0 , m 1 ) be the rooted tree whose root has m 0 subtrees, each of which is isomorphic to T (m 1 ). In general, let T (m 0 , . . . , m p ) be the rooted tree in which the root has m 0 subtrees, each of which is isomorphic to T (m 1 , . . . , m p ). Thus, T (m 0 , . . . , m p ) has 1 + m 0 + m 0 m 1 + · · · + m 0 m 1 · · · m p vertices. Consider a sufficiently large integer b, and let by sending the root of T b to itself and sending every non-root vertex in T b to its parent (see Figure 1). Let and note that n b = b 2 (1 + o(1)) (here and in what follows, the asymptotic notation is taken as b → ∞). Our first goal will be to estimate the degrees of F b and F k b . We will then prove that for every sufficiently large n, we can find an n-element set X n , a function f n : X n → X n , and a suitable function F b such that deg(F b ) approximates deg(f n ) and deg(F k b ) approximates deg(f k n ).
We define the depth of a vertex v in T b to be the smallest nonnegative integer t such that F t b (v) is the root of T b . The root of T b is the only vertex of depth 0; it has b 0 preimages under F b and 1 + k−1  Combining this information, we find that (1)).
Now let n be a sufficiently large integer, and let b be the unique integer such that n b ≤ n < n b+1 . Let X n = X (b) ∪ Y n , where Y n is a set of size n − n b that is disjoint from X (b) . Define f n : X n → X n by f n (x) = F b (x) for x ∈ X (b) and f n (y) = y for y ∈ Y n . It is straightforward to check that and n b+1 = b 2 (1 + o(1)), we have n b n = 1 + o(1) and n b+1 n = 1 + o(1). Consequently, A similar argument shows that deg(f k n ) = n 1−1/2 k−1 (1 + o (1)).
The proof of the second statement in Theorem 3.1 requires the following lemma.
Lemma 3.4. If X is a finite set, f : X → X is a function, and x ∈ X, then Proof. For convenience, let us write r = |f −k (x)| and α = 2 − 1/2 −1 for ≥ 1. By the Cauchy-Schwarz inequality, we have Using calculus, we find that the minimum of u as u 0 , . . . , u k range over positive real numbers with u 0 = 1 and u k = r occurs when The unique solution to this system of equations is given by as one may check by direct substitution. With these values of u 0 , . . . , u k , one can verify that u 2 +1 u = 2 +2 (r/2 k ) 2/α k for all ∈ {0, . . . , k − 1}. Therefore, Proof of Theorem 3.1. As mentioned above, the first statement of the theorem follows from Proposition 3.3. To prove the second statement, suppose X is an n-element set and f : X → X is a function. For every integer ≥ 0, we have where the last inequality follows from Lemma 3.4. Let r It follows from (4) that If γ ≥ 2 − 1/2 k−1 , then we deduce that

Future Directions
In this section, we list some possible directions for extending the investigation of degrees of noninvertibility. Of course, a natural place to start would be to consider other specific families of combinatorially interesting discrete dynamical systems. Even restricting our attention to the specific families considered in Section 2, there are several problems that remain open.
To state our first problem, we require the maps t i : S n → S n from Section 2.1. The monoid H 0 (S n ) = t 1 , . . . , t n−1 generated by the operators t 1 , . . . , t n−1 is known as the 0-Hecke monoid 1 of S n [12]. Each element T ∈ H 0 (S n ) is a function from S n to S n given by T = t ir • · · · • t i 1 for some i 1 , . . . , i r ∈ [n−1] (we allow the list i 1 , . . . , i r to contain repeats). We say T is eventually constant if there exists a positive integer k such that T k is the constant function that sends every permutation in S n to the identity permutation 123 · · · n. One can show that T is eventually constant if and only if every element of [n − 1] appears at least once in the list i 1 , . . . , i r . Notice that the bubble sort map B : S n → S n is an eventually constant element of H 0 (S n ). It would be interesting to study the degrees of other eventually constant elements of the 0-Hecke monoid of S n .
For example, if r odd n (respectively, r even n ) denotes the largest odd (respectively, even) element of [n − 1], then we define T odd = t r odd n • · · · • t 5 • t 3 • t 1 and T even = t r even n • · · · • t 6 • t 4 • t 2 . We then let T alt = T even • T odd and T tla = T odd • T even .
Note that the map T alt is known as the odd-even sort [16,17]. The following conjecture states that among all eventually constant elements of H 0 (S n ), bubble sort is the closest to being invertible, while T tla is the farthest. As above, we let deg n (B), deg n (T alt ), and deg n (T tla ) denote the degrees of B : S n → S n , T alt : S n → S n , and T tla : S n → S n , respectively. We know by Theorem 2.5 that deg n (B) = n(n + 1) 6 . However, we do not know deg n (T tla ). What is fascinating is that while deg n (B) grows quadratically in n, deg n (T alt ) and deg n (T tla ) grow exponentially. Indeed, one can show that |T alt (S n )| and |T tla (S n )| are both equal to the number of up-down permutations of length n. It then follows from Lemma 1.2 and the known asymptotic formula for the number of up-down permutations of length n (see sequence A000111 in [18]) that deg n (T alt ) and deg n (T tla ) are both at least n! |T tla (S n )| ∼ π 4 π 2 n . Problem 4.2. Find improved asymptotic estimates (or even exact formulas!) for deg n (T alt ) and deg n (T tla ).
Let us remark that deg n (T alt ) = deg n (T tla ) when n is odd (this equality can fail when n is even). In fact, for n odd, T alt and T tla are dynamically equivalent. To see this, define the reverse complement of a permutation π = π 1 · · · π n to be the permutation rc(π) = (n + 1 − π n )(n + 1 − π n−1 ) · · · (n + 1 − π 1 ). One can show that T alt (rc(π)) = rc(T tla (π)) whenever π ∈ S n and n is odd.
We now turn our attention to the stack-sorting map s. Using the "decomposition lemma" described in [6], we have computed |s −1 (s(π))| 1/n for several random permutations in S n in order to gauge the size lim n→∞ deg n (s) 1/n . We first computed this quantity for 1000 random permutations with n = 100; the mean and standard deviation were 1.69 and 0.09. We then tried 100 random permutations with n = 300; the mean and standard deviation were 1.70 and 0.06. This suggests that lim n→∞ deg n (s) 1/n should lie in the interval (1.6, 1.8).
Let us also recall the following conjecture from Section 2.6. It would be nice to improve the constants appearing in this theorem. Of particular interest is when γ = 2 − 1/2 k−1 since this is the minimum γ for which the second statement of the theorem applies. Thus, we have the following more specific problem. Even answering Question 4.4 for k = 2 would be quite interesting. So far, we know from Theorem 3.1 that, if this limit exists, its value is between 3 −3/2 ≈ 0.19245 and 2 5/2 3 −3/2 ≈ 1.08866.
One final question, motivated by the lack of submultiplicativity mentioned in Remark 3.2, is as follows.

Acknowledgements
The first author was supported by a Fannie and John Hertz Foundation Fellowship and an NSF Graduate Research Fellowship. The second author was supported by a Simons Foundation Collaboration Grant. Both authors thank Vic Reiner for suggesting that they examine the Bulgarian solitaire map.