Asymptotics of 3-stack-sortable permutations

We derive a simple functional equation with two catalytic variables characterising the generating function of 3-stack-sortable permutations. Using this functional equation, we extend the 174-term series to 1000 terms. From this series, we conjecture that the generating function behaves as $$W(t) \sim C_0(1-\mu_3 t)^\alpha \cdot \log^\beta(1-\mu_3 t), $$ so that $$[t^n]W(t)=w_n \sim \frac{c_0\mu_3^n}{ n^{(\alpha+1)}\cdot \log^\lambda{n}} ,$$ where $\mu_3 = 9.69963634535(30),$ $\alpha = 2.0 \pm 0.25.$ If $\alpha = 2$ exactly, then $\lambda = -\beta+1$, and we estimate $\beta \approx -3.$ If $\alpha$ is not an integer, then $\lambda=-\beta$, but we cannot give a useful estimate of $\beta$. The growth constant estimate (just) contradicts a conjecture of the first author that $$9.702<\mu_3 \le 9.704.$$ We also prove a new rigorous lower bound of $\mu_3\geq 9.4854$, allowing us to disprove a conjecture of B\'ona. We then further extend the series using differential-approximants to obtain approximate coefficients $O(t^{2000}),$ expected to be accurate to $20$ significant digits, and use the approximate coefficients to provide additional evidence supporting the results obtained from the exact coefficients.


Introduction
A permutation is said to be stack-sortable if, when it is passed through a stack, the result is the increasing permutation. We describe the stack-sorting map below. Knuth [20] showed that a permutation is stack-sortable if and only if it avoids the pattern 231; such permutations are counted by the Catalan numbers C n = 2n n /(n + 1) ∼ 4 n / √ n 3 π.
A permutation is called k-stack-sortable if iteratively applying the stack-sorting map to it k times results in the increasing permutation. Let W k (n) be the set of k-stack-sortable permutations in S n . West [24] conjectured, and Zeilberger [25] subsequently proved, that |W 2 (n)| = 2 (n + 1)(2n + 1) There is no known formula for |W k (n)| when k ≥ 3 is fixed. We are primarily interested in the numbers |W 3 (n)|, which we will denote by w n .
Recently, the first author [7] developed a polynomial-time algorithm to generate the numbers w n , thus providing a dramatic extension of the pre-existing 13-term sequence; the sequence is given in the OEIS as A134664 [22]. We start by reinterpreting these arguments to derive a functional equation that characterises this generating function. Using this functional equation, we then write an efficient algorithm with which we compute the numbers w n for n ≤ 1000. This type of algorithm could apply to a wide variety of combinatorial functional equations, yet to our knowledge it is new to enumerative combinatorics.
In the next section, we describe the derivation of the functional equation. We then describe the algorithm in Section 3. The remaining sections are devoted to a careful analysis of the 1000-term series that we use to conjecture the asymptotic behaviour. We first use standard methods of series analysis, looking at the behaviour of the ratios of successive coefficients and then analysing the series by the method of differential-approximants [17]. We then use the differential-approximants to (approximately) extend the series from 1000 terms to 2000 terms, using a method of series extension developed by the third author [18], where the newly-obtained coefficients are expected to be accurate to at least 20 significant digits. This is more than sufficient for ratio-based analysis methods, so we then re-analyse the extended series.

Background and notation
Throughout this article, a permutation is an ordering of a finite set of positive integers. We let S n denote the set of permutations of the set [n] := {1, . . . , n}. Let id n = 123 · · · n be the identity permutation in S n . The standardisation of a permutation π = π 1 · · · π n is the permutation in S n obtained by replacing the ith-smallest entry in π by i for all i. For example, the standardisation of 5971 is 2431. We say two permutations have the same relative order if their standardisations are equal. We say a permutation π contains a permutation τ as a pattern if there is a (not necessarily consecutive) subsequence of π that has the same relative order as τ ; otherwise, we say π avoids τ . A descent of π is an index i ∈ [n − 1] such that π i > π i+1 . A peak of π is an index i ∈ {2, . . . , n − 1} such that π i−1 < π i > π i+1 .
The stack-sorting map is a function s that sends permutations to permutations; there is a simple recursive definition of this map. First, s sends the empty permutation to itself. Now suppose π is a nonempty permutation with largest entry m, and write π = LmR. Then s(π) = s(L)s(R)m. A simple consequence of this definition, which we will use tacitly in what follows, is that |s −1 (π)| = |s −1 (π )| whenever π and π have the same relative order.
We say a permutation π is k-stack-sortable if s k (π) is increasing, where s k denotes the kfold iterate of s. Let W k (n) denote the set of k-stack-sortable permutations in S n . Let W k = n≥0 W k (n).
The following theorem is essentially due to Knuth; it is the theorem that initiated the study of permutation patterns. The preceding theorem tells us that if X is any finite set of n positive integers and π is the permutation obtained by listing the elements of X in increasing order, then |s −1 (π)| = C n .

Theorem 1.2 ([24]).
A permutation is 2-stack-sortable if and only if it avoids the pattern 2341 and also avoids any 3241 pattern that is not part of a 35241 pattern.
As an example of West's characterisation, consider the permutation π = 416352. Note that π avoids 2341. On the other hand, π does not avoid 3241 because it contains the subsequence 4352. However, this subsequence is part of the occurrence 46352 of the pattern 35241, so it does not prevent 416352 from being 2-stack-sortable. There are no other occurrences of 3241 in π, so π is 2-stack-sortable.
In [23],Úlfarsson gave an analogous characterisation of 3-stack-sortable permutations. However, his characterisation is too unwieldy to be useful for enumerative purposes. In order to obtain a functional equation for the generating function of 3-stack-sortable permutations, we will instead make use of the Decomposition Lemma from [7].

The Functional Equation
In this section we derive a functional equation characterising the generating function of 3stack-sortable permutations, as summarised by the following theorem. Recall that we let w n = |W 3 (n)| denote the number of 3-stack-sortable permutations in S n .

The generating function
w n t n counting 3-stack-sortable permutations is given by W (t) = Q(t, 0, 0).
We have been unable to solve this functional equation exactly. However, we will use it in subsequent sections to compute 1000 terms of the series W (t), which we will then analyse empirically.
In order to derive the functional equation, we will represent a permutation π = π 1 · · · π n via its plot, which is the diagram showing the points (i, π i ) for all i ∈ [n]. A hook of π is a sideways L shape that connects two points (i, π i ) and (j, π j ) in the plot of π such that i < j and π i < π j . The point (i, π i ) is the southwest endpoint of the hook, and (j, π j ) is the northeast endpoint of the hook. Let SW i (π) be the set of hooks of π with southwest endpoint (i, π i ). For example, if π = 426315789, then the hook shown in Figure 1 is in SW 3 (π) because its southwest endpoint is (3,6); its northeast endpoint is (8,8). The tail length of a permutation π = π 1 · · · π n ∈ S n , denoted tl(π), is the largest integer ∈ {0, . . . , n}'such that π i = i for all i ∈ {n− +1, . . . , n}. For example, tl(23145) = 2, tl(23154) = 0, and tl(12345) = 5. The tail of π is the sequence of points (n − tl(π) + 1, n − tl(π) + 1), . . . , (n, n) in the plot of π. For example, the tail of 426315789 is (7, 7), (8,8), (9,9). We say a descent d of π is tail-bound if every hook in SW d (π) has its northeast endpoint in the tail of π. The descents of 426315789 are 1, 3, and 4, but the only tail-bound descent is 3. For π ∈ S n \ {id n }, let c(π) denote the index such that π c(π) = n − tl(π); note that c(π) is a tail-bound descent of π.
We now describe a way to decompose permutations, which will lead to a functional equation for the generating function of 3-stack-sortable permutations. Let H be a hook of π with southwest endpoint (i, π i ) and northeast endpoint (j, π j ). Let π H U = π 1 · · · π i π j+1 · · · π n and π H S = π i+1 · · · π j−1 . The permutations π H U and π H S are called the H-unsheltered subpermutation of π and the H-sheltered subpermutation of π respectively. For instance, if π = 426315789 and H is the hook shown in Figure 1, then π H U = 4269 and π H S = 3157. We will deal exclusively with situations in which i is a tail-bound descent of π, in which case the plot of π H S lies entirely beneath the hook H (it is "sheltered" by H). Lemma 2.2 (Decomposition Lemma [7]). If d is a tail-bound descent of a permutation π, then Theorem 1.2 gives us structural information about 2-stack-sortable permutations. Since W 3 = s −1 (W 2 ), we can combine this structural information with the Decomposition Lemma in order to gain enumerative information about 3-stack-sortable permutations. In doing so, we must keep track of the tail length statistic, as well as one additional statistic that we now describe.
For a ∈ {0, . . . , n}, we say the open interval (a, a + 1) is a legal space for π if there do not exist indices i 1 < i 2 < i 3 such that π i3 ≤ a < π i1 < π i2 . Let leg(π) be the number of legal spaces of π. For example, if π ∈ S n , then leg(π) = n + 1 if and only if π avoids 231. The legal spaces of 145326 are (0, 1), (1,2), (4,5), (5,6), (6,7), so leg(145326) = 5. Imagine taking the plot of a permutation π and adding a new point to the left of all other points. One can think of the legal spaces of π as the vertical positions where the new point can be inserted so as to not form a new 2341 pattern. This is relevant for us because, as we know from Theorem 1.2, every 2-stack-sortable permutation avoids 2341.
We will prove a new functional equation for the generating function allowing us to characterise the series J(t, 1, 1) = n≥1 w n t n (where w n = |W 3 (n)|). Transforming this functional equation to that of a related series will yield Theorem 2.1.
One contribution to the generating function J(t, u, v) comes from the preimages of the identity permutations: where is the generating function of the Catalan numbers. Indeed, this follows immediately from Theorem 1.1. Now suppose we want to construct a nonempty permutation σ ∈ W 3 such that s(σ) = π is not an identity permutation. The Decomposition Lemma tells us that the number of ways to do this is equal to the number of ways to choose π ∈ W 2 , H ∈ SW c(π) (π), µ ∈ s −1 (π H U ), and λ ∈ s −1 (π H S ). In order to choose π, we first choose the standardisations ζ U and ζ S of π H U and π H S respectively. We then combine them along with the additional hook H to form π. When combining them, we need to choose how many points from each of π H U and π H S end up in the tail of π. We also need to choose the relative heights of the points of π H U compared to the points of π H S . Because the resulting permutation π is supposed to be 2-stack-sortable, one can use Theorem 1.2 to check that ζ U and ζ S must be in W 2 . Furthermore, we cannot have more than one point in π H U outside of the tail of π that lies above a point in π H S . In other words, at most one point to the left of (c(π), π c(π) ) can lie above a point to the right of (c(π), π c(π) ). If such a point exists, it must be placed in a position corresponding to one of the legal spaces of π H S . Using Theorem 1.2, one can verify that the permutation π produced in this fashion is necessarily 2-stack-sortable. Example 2.3. Suppose we choose the standardisations of π H U and π H S to be ζ U = 24315678 and ζ S = 315246 respectively. Furthermore, suppose we choose to put 3 of the points of π H U and 1 of the points of π H S into the tail of π. These choices almost determine the 2-stack-sortable permutation π. Indeed, π must be of the shape shown in Figure 2, where the open dashed circles represent the possible places where we can put the largest point to the left of (c(π), π c(π) ). These open dashed circles correspond to the legal spaces of π H S lying below the point (c(π), π c(π) ). Theorem 2.4. We have Proof. As mentioned above, the contribution to J(t, u, v) coming from preimages of identity permutations is C(tuv) − 1. Let us now go through the procedure described above in order to find a preimage of a nonidentity permutation in W 2 . We first consider the case in which there are no points to the left of (c(π), π c(π) ) (i.e., c(π) = 1). Equivalently, the standardisation ζ U of π H U is an identity permutation of some length m ≥ 1, and m − 1 of the points of π H U are in the tail of π. We can choose any permutation in W 2 to be the standardisation ζ S of π H S . If ζ S is not an identity permutation, then the number of points of π H S that we place in the tail of π can be any element of {0, . . . , tl(ζ S )}. If ζ S is an identity permutation of some length m , then the number of points of π H S that we place in the tail of π can be any element of {0, . . . , m − 1} (choosing all m points to go in the tail of π would result in π being an identity permutation).
There are then C m ways to choose µ ∈ s −1 (π H U ) and |s −1 (ζ S )| ways to choose λ ∈ s −1 (π H S ). The total contribution to J(t, u, v) coming from this case is where the initial factor tuv accounts for the northeast endpoint of H.
We now consider the case in which there is at least one point in the plot of π to the left of (c(π), π c(π) ) (i.e., c(π) ≥ 2). We choose ζ S along with the number of points of π H S that go into the tail of π. There are then leg(π H S ) − choices for the position of the highest point to the left of (c(π), π c(π) ). We also need to choose the standardisation ζ U of π H U and the number j of points of π H U appearing in the tail of π. Finally, there are |s −1 (ζ U )| ways to choose µ ∈ s −1 (π H U ) and |s −1 (ζ S )| ways to choose λ ∈ s −1 (π H S ). The total contribution to the generating function from this case is The term − C(tuv) − 1 v appears so that we do not count the contribution coming from (2) again.
The term −(C(tuv) − 1) occurs because if ζ S = id m , then we cannot choose all of the points of π H S to appear in the tail of π. The last displayed expression simplifies to Finally, J(t, u, v) is the sum of the expressions in (1), (2), and (4). This sum simplifies to the expression stated in the theorem.
Remark 2.5. Theorem 2.4 provides a functional equation corresponding to a recurrence relation for the numbers w n = |W 3 (n)| that was obtained in [7]. In fact, that article gave a more general recurrence that counts 3-stack-sortable permutations according to their descent and peak statistics. These recurrences were vastly generalised in [9, Theorem 9.8], where they were phrased in terms of postorder readings and special sets of colored binary plane trees called troupes. We wish to remark that the proof of Theorem 2.4 generalises immediately to give functional equations corresponding to these more general recurrences. For the sake of brevity, we omit the details of this discussion. However, there are a couple of special cases that are worth mentioning.
A permutation of length n is called alternating if its descents are precisely the even elements of [n − 1]. Let ALT denote the set of alternating permutations of odd length. If we replace each occurrence of the generating function C(tuv) then the resulting generating function J(t, u, v) is such that Let EDP denote the set of permutations π in which every descent is a peak. If we replace each occurrence of the generating function C(tuv) then the resulting generating function J(t, u, v) is such that In the remainder of this section, we transform the series J(t, u, v) into a new series Q(t, x, a) that satisfies the simpler functional equation of Theorem 2.1.
By the definition of J, we clearly have , and for each monomial u k v t n appearing, we have ≤ n, k. As a consequence, the series Observe that the series This last transformation is convenient as the terms C(w) that appear in the functional equation +t(1 + x)aQ(x, a), and the generating function for 3-stack-sortable permutations is To see that (5) characterises the generating function Q(t, x, a), we write ]. The equation (5) immediately implies that Q 0 (x, a) = 0, from which it follows that Q 1 (x, a) = (x + 1) 2 (a + 1) 2 . We can then rewrite (5) as the following recurrence for n ≥ 2: Clearly this uniquely defines the series Q n (x, a), completing the proof of Theorem 2.1. Moreover, a simple induction shows that each series Q n (x, a) is in fact a polynomial of degree n + 1 in each variable.
Remark 2.6. We note the functional equation (5) appears to be just outside the boundary of functional equations that can be systematically solved. In particular, this equation has two catalytic variables a and x and takes the form of a polynomial equation in t, a, x, Q(x, a), Q(x, 0) and Q(0, a). If either Q(x, 0) or Q(0, a) did not appear there would only be one catalytic variable, in which case the series would be algebraic [5]. Moreover, increasingly systematic methods are being developed for functional equations with two catalytic variables where the polynomial is linear in Q(x, a) (see for example [1, 2, 6, 10]), whereas (5) is quadratic in Q(x, a).

Computing coefficients of the generating function
We now describe numerous algorithms to compute the numbers w n = Q n (0, 0) for n ≤ N , where N is some positive integer. Each algorithm is based on the recurrence (6).
The most direct technique would be to compute each polynomial Q n (x, a) directly from (6). However, computing a product of Q j (x, a) − Q j (x, 0) and Q n−j−1 (x, a) − Q n−j−1 (0, a) would take Θ(n 4 ) individual integer multiplications. Since there are Θ(n 2 ) such products to compute in total and the size of the integers is Θ(n), the algorithm would then have complexity Θ(N 8 ) (using naive multiplication). We note that this could be reduced to around Θ(N 5 log(N ) 3 ) using more sophisticated polynomial and integer multiplication algorithms (see [19]) and using a modular algorithm (see [16, chapter 5]), which we will describe shortly.
In order to avoid the bulk of these multiplications, we compute the values Q n (x, a) for fixed values of x and a, rather than computing the individual coefficients. The nature of the equation means that Q n (x, a) cannot be determined directly from previous terms for a = 0 or x = 0, but it can be determined for any a, x = 0. Then, knowing that Q n (x, a) is a polynomial of degree at most n + 1 in a allows Q n (x, 0) to be determined from Q n (x, 1), Q n (x, 2), . . . , Q n (x, n + 2). To be precise, these are related by the equation Similarly, we can determine Q n (0, a) from Q n (1, a), Q n (2, a), . . . , Q n (n + 2, a) using the equation Thus, the algorithm runs as follows: Algorithm 1: Computing w n for n ≤ N . input : an i n t e g e r N output : a l i s t output c o n t a i n i n g t h e numbers w n for n ≤ N for a from 0 to N+2 for x from 0 to N+2 for a from 1 to N+2 compute Q n (x, a) u s i n g (6) . for x from 1 to N+2 compute Q n (x, 0) u s i n g (7) . for a from 0 to N+2 compute Q n (0, a) u s i n g (8) . Append Q n (0, 0) to output .
As equation (6) contains a sum in which j runs from 1 to n − 2, this equation takes linear time to compute, so the number of operations in Algorithm 1 is of order Θ(n 4 ). However, a nontrivial proportion of these operations are multiplication operations, which need to be taken into account in order to determine the time complexity of this algorithm. An efficient way to apply these multiplications is to convert this to a modular algorithm, which we describe in the next subsection.

Modular algorithms
In order to avoid multiplying very large numbers, which can be computationally unwieldy, we implement a modular algorithm. That is, we choose a sequence of primes p 1 , p 2 , . . . , p k whose product we know to exceed w n for n ≤ N . Then, for each j = 1, . . . , k, we run the entirety of Algorithm 1 with all operations being modulo p j . For each prime, we then return the output values w n modulo that prime. Finally, the actual values w n can be recovered using the Chinese Remainder Theorem.
In theory, in the limit N → ∞, the optimal time complexity (up to some constant) is achieved by setting p j to be the jth-smallest prime. This way, the number of primes required will be Θ(N/ log(N )), while the size of the primes is typically Θ(log(N )), meaning that the multiplication operations typically take log(N ) 2 time (using naive multiplication). Since there are Θ(n 4 ) such operations for each prime, the theoretical time complexity is Θ(N 5 log(N )). Using a more efficient multiplication algorithm can reduce this theoretical time complexity to Θ(N 5 log(log(N ))).

Implementation of algorithm and results
In this section we describe how we implemented this algorithm to produce 1000 terms of the series W (t). The program is given on the second author's website, as are the terms produced [12].
As discussed in the previous section, the purpose of using a modular algorithm is to avoid timeintensive multiplications involving very large numbers. In practice, multiplications of (unsigned) 64-bit integers are carried out at the hardware level, so it is practical to insist that the primes be small enough that all multiplications can be applied in this way. That is, each prime p j should be less than 2 32 . While adhering to this restriction, we want to use the minimum possible number of primes, as the computation must be carried out for each one. This is achieved by setting each prime p j to be the jth-largest prime below 2 32 .
Recall that we require the product P := p 1 p 2 · · · p k to exceed w N . Since we do not have good upper bounds on the numbers w n for large n, it is inefficient to choose the number of primes k such that we can prove P > w N in advance. Instead we choose k for which we simply believe that P > N w N as then we can apply the following proposition after the algorithm has run to prove that its output really is the desired sequence: Proposition 3.1. If the output sequence w 1 , w 2 , . . . , w N of the algorithm satisfies N w n < P for each n, then w n = w n for each n.
Proof. By definition, we know that w n is the remainder when w n is divided by P , so it suffices to show that each w n < P .
We start with the fact that w n ≤ nw n−1 . This follows from the fact that removing the entry 1 from a 3-stack-sortable permutation in S n and then reducing each remaining entry by 1 yields a 3-stack-sortable permutation in S n−1 . It then follows from a simple induction on n that w n < P and, therefore, w n = w n .
We ran this algorithm with size N = 1000 and with k = 105 primes. The computation took 46 hours of computing, using 7.8GB of memory on a Dell Power Edge R820, 2.70GHz with 8 cores. The result of this computation was the sequence of numbers w n for 1 ≤ n ≤ 1000, where w n is the remainder when w n is divided by the product P := p 1 p 2 · · · p 105 ≈ 2.9 · 10 1011 .

Initial ratio analysis
In this and subsequent sections, we apply a number of numerical techniques to analyse the series so as to estimate the asymptotic behaviour of the coefficients. This, of course, is dominated by the growth constant µ 3 , but we are also interested in the sub-dominant terms, which are more difficult to estimate.
Let us first look at the ratios. If we have a simple power-law singularity, so that the generating function behaves as then with [x n ]F (x) = c n , it follows that the ratio of successive coefficients with g = θ − 1. A plot of the ratios of the 1000-term series against 1/n is shown in Figure 3. The plot appears to be linear and is going to a limit indistinguishable from µ 3 = 9.70 at this level of precision. The gradient gives an estimate of the exponent g, but not a very precise one.
Rather, if we know the value of µ 3 , it follows from (10) that we can estimate the exponent g from estimators Taking µ 3 = 9.70, we find for the estimators g n the values shown in Figure 4. These are plotted against 1/n, so we are interested in the point of intersection of the curve with the y-axis. It can be seen that the gradient of this plot changes sign at around n = 120, and there is a lot of curvature, so it is hard to estimate the limit as n → ∞ with any precision. Nevertheless, one might guess a value around −3.5 or greater, which would imply that θ ≈ −2.5 in (9). But all one can really say with any confidence is that g > −3.6. Note too that if we only had some 100 terms at our disposal we might erroneously conjecture that g ≈ −4.
The curvature in Figure 4 would suggest that the o(1) correction term does not decay as 1/n, which would be the case for a simple power-law singularity, but rather decays more slowly. In the next section we give our reasons for believing that there are logarithmic factors in the generating function; we will give a more refined ratio analysis taking this into account in Section 6.1.

Differential-approximant analysis
Next, we performed a differential-approximant analysis. As described in [17], the method of differential-approximants fits the known coefficients of a power series to a number (typically 100 or more) of holonomic differential equations, and uses the critical parameters (the radius of convergence and exponent at that point) of those differential equations as estimators of the corresponding quantities of the underlying series expansion.
More precisely, one uses the known series coefficients to find polynomials Q k (t) and P (t) such that the power series solution F (t) of the holonomic differential equation agrees with the known coefficients of the function F (t) being approximated. We refer to the order M of this ODE as the order of the approximant.
Constructing such differential-approximants (DAs) is straightforward computationally, and it only involves solving a system of linear equations. Many such DAs are constructed by varying the degrees of the polynomials Q k (t) and P (t) while still using most, or all, of the known series coefficients. The singularities are given by the zeros z i , i = 1, . . . , N M , of Q M (t), where N M is the degree of Q M (t). We take as the dominant singularity that which is both closest to the origin and common to all (or almost all) the DAs. Critical exponents θ i (9) follow from the indicial equation of the DA. For the simplest (and most common) situation where there is a single root of Q M (t) at z i , Slightly more complicated expressions are known for the cases of double, triple, etc. roots. Further details of the method can be found in [17].
We used the 1000-term series and produced differential-approximants of 2nd to 8th-order. To get an idea of the rate of convergence, we analysed the 125-term series, then the 250-term series, then the 500-term series, and finally the full 1000-term series. Results are quoted with a level of precision representing the precision with which the estimates agreed among themselves. This is not to be taken as an estimate that is accurate to the number of quoted digits. Rather, one must consider the trends as the order of the underlying differential equation is increased, and also as the number of coefficients used is increased. Results were consistent across orders, but were slightly more consistent within orders as the order increased. That is to say, the standard deviation of estimates of the radius of convergence decreased slightly as the order of the approximants increased.
We show in Table 1 the estimates of the radius of convergence x c , and in Table 2 the corresponding exponent, which is −θ in the notation of equation (9).
Using the 1000-term series and 4th order to 8th-order approximants, we estimated the radius of convergence as x c = 0.103096648616 (3). The exponent estimates are not stable and are decreasing with the number of terms. As we will see, this is characteristic of a generating function with a confluent logarithmic term, and we suggest that these exponent estimates will continue to decrease with increasing numbers of terms (approximately as 1/ log(n)), eventually reaching a limit around 2.0.
The growth constant is therefore estimated as µ 3 = 1/x c = 9.69963634535(30), which is satisfyingly close to the ratio estimate above of 9.70 but, of course, far more precise. The exponent, taken at face value, is also moderately consistent with the ratio analysis value of less than −θ ≈ 2.5 above.
However, there are other apparent singularities very close to x c . With a 300-term series, these are at x = 0.103096649223 · · · , at x = 0.1030966541±4.77×10 −9 i, and at x = 0.1032969769 · · · . With the 1000-term series, there are many more such close singularities. Having multiple singularities very close together is an indicator of a confluent singularity (i.e., not a pure power-law), and in particular of confluent logarithms.
A discussion of the behaviour of differential-approximants applied to the analysis of such a confluent singularity can be found in [14], where the analysis of the generating function of 4valent planar Eulerian orientations is given. In that case, the generating function had a dominant singularity E(x) = −x(1−µx) log(1−µx) , and the differential-approximants gave very precise estimates of the growth constant µ but suggested that the exponent was around 1.30. Additionally, there were other, less precisely located singularities whose location differed by 1 part in 10 −7 from the true singularity. That is the hallmark of a confluent logarithmic singularity. (Loosely speaking, it seems to reflect the approximant attempting to prescribe a branch cut from x c to ∞.) As another example, we constructed the test function Using 3rd-order differential-approximants and 150 terms of the series, we found the dominant singularity to be at x c ≈ 0.999999528 · · · with an exponent of 2.46217. There are other apparent singularities at x = 1.00000395 · · · and at x = 1.0221 · · · . Note that in both these cases the exponent is wrongly estimated by the differential-approximants, in each case being some 20-25% greater than the true value.
We suggest that the same phenomenon is happening with the generating function for 3-stacksortable permutations and that the correct exponent value is more likely to be around 2.0, with a confluent logarithmic term. Of course, we are not claiming to be able to distinguish between an exponent value of, say, 1.9 with a confluent logarithm, or 2.1 and a confluent logarithm, or power of a logarithm, but in our experience confluent logarithms are usually associated with integer or (rarely) half-integer exponents. We tentatively suggest that the exponent is exactly 2.0, and provide further evidence for this below.
The conclusion from this analysis is the conjecture that the generating function behaves as with α ≈ 2. As we show below, if α = 2 exactly, our best estimate of β is −3. Consequently, where µ 3 = 9.69963634535(30), α ≈ 2, and λ = −β if α is not a positive integer, and λ = −β + 1 if α is a positive integer. In the next section, we provide additional evidence for this conjectured form.

Series extension
The conjectured presence of logarithms greatly complicates the analysis. Whenever logarithms are involved, one really needs very long series to extract believable asymptotics from ratio-based analysis. In an attempt to get further terms, we use the method of series extension to obtain many more approximate coefficients. In this way, we have extended the series by 1000 approximate terms, using differential-approximants to estimate the additional coefficients. The details of this method are given in [18]. These 1000 approximate terms are given on the second author's website [12].
The basic idea is to use families of differential approximants to predict higher order terms. The rationale here is that every differential approximant, being a holonomic differential equation with a power series solution, generates, in principle, all coefficients. Of course, unless one has stumbled upon the exact solution, which sometimes happens, all coefficients beyond the order used to obtain the differential approximant will be approximate.
In extending the series, we construct many (typically 100 or more) differential approximants obtained by varying the degrees of the polynomials multiplying the various derivatives (the Q k (t) and P (t) in equation (12)) and taking the mean of these coefficients (discarding outliers). We then use the standard deviation to estimate the accuracy of the coefficients. As the order of the coefficients increases, the calculated error also increases, and we stop when the error (taken to be 1 standard deviation), exceeds some desired value.
As we propose to use these coefficients in a ratio type analysis, even 5 or 6 digit precision is often sufficient, as the error in the ratio is visually impercebtible. In this case we obtained 1000 additional, approximate terms that we believe to be accurate, at worst, to 20 significant digits. Initial reaction to this method is often one of disbelief. As a proof of concept, we first demonstrate the method on the known series for two stacks in parallel (tsip) [13], known to 501 terms. Mirroring our initial knowledge of the 3-stack-sortable series, which was first extended by the first author to 174 terms [7], we take the first 174 coefficients of the tsip series and predict the next 327 coefficients. We use 6th-order differential approximants. The correct value of the coefficient of t 501 is 1.947305150994482942849937863820882187009 . . .×10 451 . The predicted value is 1.9473051509944829428499378638083404 × 10 451 , and the predicted accuracy is 29 digits, which is seen to be the case. Lower order terms are predicted to greater accuracy. Similarly, originally given the 174 term series for 3-stack-sortable permutations, we generated a further 530 approximate terms. When we extended the series, initially to 300 terms, we confirmed that our estimate of the coefficient of t 300 was correct to 29 significant digits, as expected.

Analysis of extended series
We have repeated the ratio analysis with this extended series, now with 2000 terms, and everything remains consistent with the results of the analysis of the exact series. But we can now extract more precise information and start looking for the effect of confluent logarithmic terms.
A plot of the ratios using the estimated terms is shown in Figure 5. This should be visually indistinguishable from the corresponding plot with the exact (but unknown) coefficients, as the ratios are expected to be accurate to more than 10 significant digits. The plot is still linear and is still going to a limit indistinguishable from µ 3 = 9.70 at this level of precision. However, we can do considerably better than this with the extended series. First, we show in Figure 6 a plot of the linear intercepts of the ratios where r n is just the ratio of the coefficents. Constructing the linear intercepts eliminates the dominant O(1/n) term in the ratio plot (see equation (10)). From this figure, it is clear that one needs some 300 terms before the fact that the plot gradient changes sign becomes apparent. This plot reaches its maximum at n = 202 and appears to be going to a limit below 9.70, and indeed rather to 9.6996, consistent with the (much more precise) differential-approximant analysis. The extra terms of the extended series makes this behaviour manifest. With only 174 terms, one might well conjecture that the limit was slightly greater than 9.70, as the first author did in [7]. We repeated the simple analysis of exponent estimators g n with the extended series. The results are shown in Figure 7, which should be compared to the Figure 4, produced using only the exact series coefficients. From the extended series, one concludes that g > −3.5, but estimating the intercept with the y-axis accurately is not really possible.
Given the evidence from the differential-approximant analysis that confluent logarithms are likely present, we can say more about the asymptotics, and we can also better understand the poor convergence of the g n estimators.
From Flajolet and Sedgewick [15, page 385], we see that if then where When α is a positive integer, the evaluation of the constants must be interpreted as a limiting case as the Γ function diverges, so that certain constants vanish. In particular, provided that α is a positive integer (which we suggested based on the differential-approximant analysis) and β is not a positive integer, one has The ratio of successive coefficients is, in the general case (by which we mean α not a positive integer), but in the case that α is a positive integer and β is not a positive integer, one has One can estimate α from the sequence This is the same exponent estimator plotted in Figure 7, but this refined analysis shows that g n should be plotted not against 1/n but rather against 1/ log n, as we do in Figure 10. Extrapolation to a value around −3.0 seems much more convincing from this plot.
Furthermore, the gradient should give a measure of β or β − 1 if, as it seems, α is a positive integer. The gradient is approximately −4.05, suggesting that β ≈ −3.
We can try and refine the estimate of β by setting α to exactly 2, and so estimates of β − 1 are given by where the numerical estimates of µ and α are used. In this way, we obtain the results shown in Figure 11, which shows the difficulties of estimating the power of confluent logarithmic terms. The plot is clearly displaying a minimum, but it is difficult to extrapolate to n → ∞. Certainly β − 1 = −4 is a possibility, but so is almost any other value between −4 and 0.
Another way of estimating both µ 3 and β is to look more closely at the linear intercepts n defined in (15). We can give more terms in the asymptotic expansion of n by making use of (19). Taking these terms into account, we can rewrite (15) as n = n · r n − (n − 1) · r n−1 ∼ µ 3 1 + assuming α is a positive integer; otherwise, (1 − β) should be replaced by −β. This shows that n should be plotted against 1/(n log 2 n) and not 1/n as was done in Figure 6. We show in Figure 8 the linear intercepts plotted in this way. It can be seen that the extrapolated limit is around 9.6996, as before. However, we can also use this data to estimate β by forming from equation (15) the estimators We show this data in Figure 9, plotted against 1/ log n, as appropriate. It is clear that this is difficult to extrapolate, as it appears to be possibly forming a minimum. One would need data minimally to 1/ log n = 0.1 and ideally to 1/ log n = 0.05 to confidently extrapolate this plot. This corresponds to 22,000 and 485 million terms respectively, which is of course out of the question. This highlights the difficulty of analysing for powers of logarithms. As with our earlier analysis, β − 1 = −4 is a possibility, but so is almost any other value.  Our tentative conclusion from this analysis is that the generating function for 3-stack-sortable permutations behaves as where α ≈ 2, and β ≈ −3 if α = 2. If α = 2, it is too difficult to estimate β, as it is highly sensitive to the estimate of α.  To make these conclusions more convincing, we will re-estimate the asymptotics in two distinct ways: first from the coefficients directly, and second from the ratios as given in (19).
1 If α is a positive integer, Γ(−α) diverges. This is taken care of by replacing the pre-multiplier by some constant C, as in (23).
We will use our estimate of µ 3 and then fit to d n ≡ log f n − n log µ 3 = t 1 + t 2 log n + t 3 log(log n) + t 4 log n + t 5 log 2 n by taking successive quintuples of coefficients {d k−2 , d k−1 , d k , d k+1 , d k+2 }, letting k range over all values up to 2000, reflecting the length of the extended series at our disposal. The principal parameters of interest are t 2 and t 3 , and plots of these quantities estimated in this way are given in Figures 12 and 13 respectively.  From Figure 12, one sees t 2 convincingly approach our previously-estimated value of −3 exactly, but the plot in Figure 13 does not appear to be approaching −4, which is the previously estimated value. Investigating this, we tried fitting to fewer or more terms in the asymptotics. That is to say, here we have fitted to five unknown parameters. We tried fitting to just three parameters, then four, then six. We found that the estimates of t 2 were robust, all coming in around −3, but the estimates of t 3 were not at all robust. We conclude that this is not a good method for estimating the exponent β with the number of terms at our disposal.
A second method is to perform a similar fitting procedure with the sequence of ratios. Let r n = c n /c n−1 be the ratio of successive coefficients in the generating function. Then So as above, we assumed the estimated value of the growth constant µ 3 , and fitted successive quartets of coefficients {d k−2 , d k−1 , d k , d k+1 }, letting k range over all values up to 2000, reflecting the 2000-term (approximate) ratio series at our disposal. The principal parameters of interest are t 1 = −α − 1 and t 2 = β − 1, and plots of these quantities estimated in this way are given in Figures 14 and 15 respectively.
We repeated this analysis, fixing the value of α at 2 exactly, in the hope that it would give a more precise estimate of β. More precisely, in equation (24), we set α = 2. Then fitting as described, we obtain the results shown in Figure 11. This is not inconsistent with our previous best estimate β = −3.
We conclude this section with our best estimate of α being α = 2. However, looking at a wide variety of fits as described above, it would be more prudent to give our estimate as α = 2.0±0.25. If α = 2, we hesitantly offer β = −3 as our best estimate. If α = 2, we give no estimate of β.

Bounds
From the 174 exact coefficients he obtained, the first author gave the rigorous lower bound µ 3 ≥ w 1/174 174 = 8.659702 . . . [7]. From our 1000-term series, we have the improved result The observation used to prove this inequality was that the sum λ ⊕ π of two 3-stack-sortable permutations λ and π is also 3-stack-sortable. Using this fact allows us to deduce a stronger bound. The following method was used in a similar context in [11,Section 4], as well as a variety of different contexts where objects are decomposable (see, for example, [21, p. 89-92]). The idea is that we know that W (t) can be written as where W (t) is the generating function for primitive 3-stack-sortable permutations. The 1000 coefficients of W (t) yield 1000 coefficients of W (t). Moreover, by the equation relating W (t) and W (t), we must have W (t) < 1 for 0 < t < 1/µ 3 . We find that the first 1000 terms of the series W (t) add to 1 for t = t c = 0.105424 . . ., so µ 3 ≥ 1/t c = 9.4854 . . .
We note that using our approximate terms up to size 2000, we expect that this bound would improve to µ 3 ≥ 9.5828 . . .
given 2000 exact terms. Let us also remark that the best known rigorous upper bound for µ 3 , obtained in [8], is 12.53296.
In [3,4], Bóna conjectured that |W k (n)| ≤ (k+1)n n for all n, k ≥ 1. In [7], it was suggested that this conjecture is likely false, and it was shown that it contradicts a separate conjecture of Bóna's stating that the sequence (w n ) n≥1 is log-convex (meaning that the ratios w n+1 /w n are increasing). The first author was not able to fully disprove Bóna's first conjecture because he did not have a sufficient lower bound for µ 3 . The rigorous lower bound µ 3 ≥ 9.4854 . . . that we have just obtained is significant because it allows us to disprove Bóna's first conjecture. Indeed, for k = 3, the inequality w n ≤ 4n n would imply that µ 3 ≤ 256/27 ≈ 9.481, which we now know is not the case. On the other hand, our new data shows that the first 1000 terms in the sequence (w n ) n≥1 are log-convex, lending even more evidence toward Bóna's second conjecture.
The estimate of the growth constant (just) contradicts the conjecture of the first author that 9.702 < µ 3 ≤ 9.704.
We extended the series, using differential-approximants, to obtain an additional 1000 approximate coefficients and used the approximate coefficients and ratio-analysis methods to confirm the results obtained from the differential-approximant analysis of the exact coefficients. We are confident of our estimate of the growth constant. We are slightly less confident of the estimate of the value of the exponent α and much less confident of our estimate of the value of β.
We have also attempted to identify µ 3 experimentally as an algebraic number, and as a product of fractional powers of small primes, but without success.