Distribution of variables in lambda-terms with restrictions on De Bruijn indices and De Bruijn levels

We investigate the number of variables in two special subclasses of lambda-terms that are restricted by a bound of the number of abstractions between a variable and its binding lambda, the so-called De-Bruijn index, or by a bound of the nesting levels of abstractions, \textit{i.e.}, the number of De Bruijn levels, respectively. These restrictions are on the one hand very natural from a practical point of view, and on the other hand they simplify the counting problem compared to that of unrestricted lambda-terms in such a way that the common methods of analytic combinatorics are applicable. We will show that the total number of variables is asymptotically normally distributed for both subclasses of lambda-terms with mean and variance asymptotically equal to $Cn$ and $\tilde{C}n$, respectively, where the constants $C$ and $\tilde{C}$ depend on the bound that has been imposed. So far we just derived closed formulas for the constants in case of the class of lambda-terms with bounded De Bruijn index. However, for the other class of lambda-terms that we consider, namely lambda-terms with a bounded number of De Bruijn levels, we investigate the number of variables, as well as abstractions and applications, in the different De Bruijn levels and thereby exhibit a so-called"unary profile"that attains a very interesting shape.


Motivation
Lambda-terms are objects stemming from lambda-calculus and can be seen as combinatorial objects with a simple description.Nevertheless, the enumeration of lambda-terms is not well understood.Combinatorially, they can be seen as words (sequence of symbols) or graphs and thus the combinatorially most natural way to define an enumeration problem is to ask for the number of terms with a given number of symbols or vertices, respectively.This problem appears to be very intriguing, as standard techniques fail.Considering subclasses by imposing certain restrictions can turn the enumeration problem into an accessible one, but for one particular model the enumeration formulas exhibit a very peculiar behaviour.Our motivation to perform the present investigation is to shed light on this oddity and to give a combinatorial explanation for this phenomenon.

Previous work and the considered model
Lambda-calculus is a set of rules to manipulate lambda-terms and it is an important tool in theoretical computer science.To our knowledge, the first appearance of enumeration problems in the sense of enumerative combinatorics which are linked to lambda-calculus is found in [25], where certain models of lambda-calculus are analyzed which have representations as formal power series.More recently, we observe rising interest in the quantitative properties of large random lambda-terms.The first work in this direction seems to be [28].Later David et al. [15] investigated the proportion of normalising terms, which was also the topic of [5] in a different context.Other papers dealing with certain structural properties of lambda-terms are for instance [13,22,30].
Since studying quantitative aspects of lambda-terms using combinatorial methods relies heavily on their enumeration, many papers are devoted to their enumeration, which itself very much depends on the particular class of terms and the definition of the term size.The enumeration may be done by constructing bijections to certain classes of maps, see e.g.[6,34,35] or the use of the methodology from analytic combinatorics [20], see e.g.[3,7,8,9,11,24,27].
Another approach to gain structural insight is by random generation.Solving the enumeration problems is the basis for an efficient algorithm for this purpose, namely Boltzmann sampling [17,18].The method is extendible to a multivariate setting allowing for a fine tuning according to specified structural properties of the sampled objects, as was demonstrated in [2,12].The generation of lambda-terms was treated in [4,6,24,29,31,32].
As mentioned above, lambda-terms can be seen as graphs, and so the natural combinatorial question is to ask for the number of such graphs with n vertices.It turned out, however, that this model leads to a generating function being a purely formal power series, which can be represented as an infinite nesting of square-roots of polynomial terms, see [7].The enumeration problem becomes amenable to analytic techniques, when imposing restrictions on the lambda-terms: One such model is the class of all lambda-terms with a bounded number of De Bruijn levels.In [8] the authors discovered a very interesting phenomenon concerning the generating function associated with this model, namely that the asymptotic behaviour of the coefficients of the generating function changes with the imposed bound.More precisely, the type of the dominant singularity changes from 1  2 to 1 teration of the type of the dominant singularity has a direct impact on the subexponential factor of the coefficient's asymptotics, namely it shifts from n −3/2 to n −5/2 for n tending to infinity.This paper studies the structure of random large lambda-terms belonging to this class and thereby delivers an explanation of the above mentioned phenomenon, since it arises from the location of the variables within the lambda-term.
The lambda-calculus was invented by Church and Kleene in the 1930ies as a tool for the investigation of decision problems.Today it still plays an important role in computability theory and for automatic proof systems.Furthermore, it represents the basis for some programming languages, such as LISP or Haskell.In fact, the generation of random lambda terms served for optimising the Glasgow Haskell Compiler [29] and for finding bug in a C-compiler [33] As mentioned at the beginning, recently, rising interest in the number and structural properties of lambda-terms can be observed.This is triggered on the one hand by the fact that random lambda-terms have practical application and the understanding of structural properties enables their tuning when generating random terms, see [2], on the other hand, they turned out to be a source of interesting, albeit in part very intricate, combinatorial enumeration problems.Finally, we mention that there is a direct relationship between these random structures acting as computer programs and mathematical proofs (see [14]), but this relationship essentially concerns only typed lambda-terms and not general ones.
For a thorough introduction to lambda calculus we refer to [1].This paper does not require any preliminary knowledge of lambda calculus in order to follow the proofs.Instead we will study the basic objects of lambda calculus, namely lambda-terms, by considering them as combinatorial objects, or more precisely as a special class of directed acyclic graphs (DAGs).

The considered model
We are now in position to formally define the model of lambda-terms which we are considering in this paper.The way we see the structures and the methodology of proving our results heavily relies on analytic combinatorics.Thus, a profound knowledge of some methods in this field is necessary to follow all our arguments in detail.The prerequisites can be found in Flajolet and Sedgewick's seminal book [20].
Definition 1 (lambda-terms, [23,Definition 3]).Let V be a countable set of variables.The set Λ of lambda-terms is defined by the following grammar: 1. every variable in V is a lambda-term, 2. if T and S are lambda-terms then T S is a lambda-term, (application) 3. if T is a lambda-term and x is a variable then λx.T is a lambda-term.(abstraction) The name application arises, since lambda-terms of the form T S can be regarded as functions T (S), where the function T is applied to S, which in turn can be a function itself.An abstraction can be considered as a quantifier that binds the respective variable in the sub-lambda-term within its scope.Both application and repeated abstraction are not commutative, i.e., in general the lambda-terms T S and ST , as well as λx.λy.M and λy.λx.M , are different (with the exceptions of T = S and none of the variables x or y occurring in M , respectively).Each λ binds exactly one variable, which may occur several times in a term or even not at all.And since we will just focus on the special subclass of closed lambda-terms, each variable is bound.
We will consider lambda-terms modulo α-equivalence, which means that we identify two lambda-terms if they only differ by the names of their bound variables.For example λx.(λy.(xy))≡ λy.(λz.(yz)).1972 De Bruijn [16] introduced a representation for lambdaterms that completely avoids the use of variables by substituting them by natural numbers that indicate the number of abstractions between the variable and its binding lambda (the binding lambda is counted as well), i.e., λx.(λy.(xy))= λ(λ21).There is also a combinatorial interpretation of lambda-terms that considers them as DAGs and thereby naturally identifies two α-equivalent terms to be equal.Combinatorially, lambda-terms can be seen as rooted unary-binary trees containing additional directed edges.Note that in general the resulting structures are not trees in the sense of graph theory, but due to their close relation to trees (see Definition 3) some authors call them lambda-trees or enriched trees.We will call them lambda-DAGs in order to emphasise that these structures are in fact DAGs, if we consider the undirected edges of the underlying tree to be directed away from its root.Definition 3 (lambda-DAG, [23,Definition 5]).With every lambda-term T , the corresponding lambda-DAG G(T ) can be constructed in the following way: 1.If x is a variable then G(x) is a single node labeled with x.Note that x is unbound.
2. G(P Q) is a lambda-DAG with a binary node as root, having the two lambda-DAGs G(P ) (to the left) and G(Q) (to the right) as subgraphs.
3. The DAG G(λx.P ) is obtained from G(P ) in four steps: (a) Add a unary node as new root.
(b) Connect the new root by an undirected edge with the root of G(P).
(c) Connect all leaves of G(P ) labelled with x by directed edges with the new root, where the root is start vertex of these edges.
(d) Remove all labels x from G(P ).Note that now x is bound.
the electronic journal of combinatorics 26(4) (2019), #P4.47 Obviously, applications correspond to binary nodes and abstractions correspond to unary nodes of the underlying Motzkin-tree that is obtained by removing all directed edges.Of course, in the lambda-DAG some of the vertices that were former unary nodes might have gained out-going edges, so they are no unary nodes in the lambda-DAG anymore.However, when we speak of unary nodes in the following, we mean the unary nodes of the underlying unary-binary tree that forms the skeleton of the lambda-DAG.Figure 1: The lambda-DAGs corresponding to the respective terms written below.Since the skeleton of a lambda-DAG is a tree, we sometimes call the variables leaves (i.e., the nodes with out-degree zero), and the path connecting the root with a leaf (consisting of undirected edges) is called a branch.There are different approaches as to how one can define the size of a lambda-term (see [15,8,27]), but within this paper the size will be defined as the number of nodes in the corresponding lambda-DAG.This is combinatorially the most natural definition, and it is even equivalent to Barendregt's [1] definition.
In the lambda-DAG, the De Bruijn indices and levels are easily visible: The De Bruijn index of a variable v is the number of unary nodes we find on the path from v to its binding lambda in the skeleton of the lambda-DAG, where the last unary node on the path has to be counted as well.The De Bruijn level of v is the number of unary nodes on the path from v to the root.See Figure 2.
At first sight lambda-terms appear to be very simple structures, in the sense that their construction can easily be described, but so far no one has yet accomplished to derive their asymptotic number.However, the asymptotic equivalent of the logarithm of this number can be determined up to the second-order term (see [9]).The difficulty of counting unrestricted lambda-terms arises due to the fact that their number increases superexponentially with increasing size.The reason for this are the many degrees of freedom to choose the bindings.Thus, if we translate the counting problem into generating functions, then the resulting generating function has a radius of convergence equal to zero, which makes the common methods of analytic combinatorics inapplicable.This fast growth of the number of lambda-terms can be explained by the numerous possible bindings of leaves by lambdas, i.e. by unary nodes.Consequently, lately some simpler subclasses of lambda-terms, which reduce these multiple binding possibilities, have been studied, e.g.lambda-terms with prescribed number of unary nodes ( [8]), or lambda-terms in which every lambda binds a prescribed ([9, 6, 23]) or a bounded ( [10,6,23]) number of leaves.In this paper we will investigate structural properties of lambda-terms that have been introduced in [7] and [8], namely at first lambda-terms with a bounded number of abstractions between each leaf and its binding lambda, which corresponds to a bounded De Bruijn index.The second class of lambda-terms that we will investigate within this paper is the class of lambda-terms with a bounded number of nesting levels of abstractions, i.e., lambda-terms with a bounded number of De Bruijn levels.From a practical point of view these restrictions appear to be very natural, since the number of abstractions in lambda-terms which are used for computer programming is in general assumed to be very low compared to their size (cf.[33]).
Particular interest lies in the number and distribution of the variables within these special subclasses of lambda-terms.We will show within this paper that the total number of leaves (i.e., variables) in lambda-DAGs with bounded De Bruijn indices as well as in lambda-terms with bounded number of De Bruijn levels is asymptotically normally distributed.For the latter class of lambda-terms we will also investigate the number of leaves in the different De Bruijn levels, which shows a very interesting behaviour.We will see that in the lower De Bruijn levels, i.e. near the root of the lambda-DAG, there are very few leaves, while almost all of the leaves are located in the upper De Bruijn levels and these two domains will turn out to be asymptotically strictly separated.The same behaviour can be shown for unary and binary nodes, which allows us to set up a very interesting "unary profile" of this class of lambda-terms.
For lambda-terms that are locally restricted by a bound for the De Bruijn indices the number of De Bruijn levels is not bounded and will tend to infinity for increasing size.The expected number of De Bruijn levels is unknown, which implies that the correct scaling cannot be determined.Thus, we have not been able to establish results concerning the leaves (or other types of nodes) on the different De Bruijn levels for this class of lambdaterms so far.Nevertheless, further studies on this subject seem to be very interesting.

Plan of the paper
We will present the main results that have been derived in this paper, including all the definitions that are necessary for their understanding, in Section 2, while the subsequent sections are concerned with their proofs.In Section 3 we will show that the total number of variables in lambda-terms with bounded De Bruijn index is asymptotically normally distributed with mean and variance asymptotically Cn and Cn, respectively, where the constants C and C depend on the bound that has been imposed.Section 4 shows the same result for lambda-terms where the number of De Bruijn levels is bounded.Finally, in the last section, Section 5, we show how the variables are distributed in lambda-terms with bounded number of De Bruijn levels.We will see that there are very few leaves on the lower De Bruijn levels, i.e., close to the root, while on the upper De Bruijn levels farther away from the root, there are many leaves.Furthermore, these two domains are strictly separated and we know exactly which is the first level containing a large number of leaves, since this level can be determined by the imposed bound of the number of De Bruijn levels.This interesting behaviour also holds for the number of binary and unary nodes.By investigating all these numbers among the different De Bruijn levels we are able to set up a so-called unary profile that shows that these special lambda-terms have a very specific shape.A random closed lambda-term with a bounded number of De Bruijn levels starts with a string of unary nodes, where the length of this string depends on the imposed bound.Then it gets slowly filled with nodes until it reaches the aforementioned separating level, where it suddenly starts to contain a lot of nodes.

Main results
In this section we will introduce the basic definitions and summarize the main results that will be presented in this paper.
First, we will investigate the total number of variables in lambda-terms with bounded De Bruijn index, i.e., with a bounded number of abstractions between each leaf and its binding lambda.Our first main result concerns the asymptotic distribution of the number of variables within this class of closed lambda-terms.Theorem 4. Let X n be the total number of variables in a random closed lambda-term of size n where the De Bruijn index of each variable is at most k.Then X n is asymptotically normally distributed with Remark 5. Note that EX n → n 2 and VX n → 0 for k → ∞.Since these values are known for the number of leaves in binary trees, this gives a hint that almost all leaves of a large random unrestricted lambda-term are located within an almost purely binary structure.
Next we turn to lambda-terms with a bounded number of De Bruijn levels, i.e. with a bounded number of unary nodes (or abstractions, respectively) in the separate branches of the corresponding lambda-DAG.Figure 3 shows the partition of the vertex set which is induced by the De Bruijn levels.Remark 6. De Bruijn [16] introduced the concepts De Bruijn index and De Bruijn level exclusively for variables and called them reference depth and level, respectively.The notion De Bruijn level, however, can evidently be extended to all vertices of the lambda-DAG.We will use this extension later in the paper.
(Cf. also Figure 3  Lambda-terms with bounded number of De Bruijn levels have been studied in [8], where a very unusual behaviour has been discovered.The asymptotic behaviour of the number of lambda-terms belonging to this subclass differs depending on whether the imposed bound is an element of a certain sequence (N i ) i 0 , which will be given in Definition 7, or not.Though the behaviour of the counting sequence differs for these two cases, the result in Theorem 9 concerning lambda-terms with bounded number of De Bruijn levels is the same after all.However, the method of proof is different in the two cases.For our subsequent results the distinction of cases will have an impact on the asymptotic behaviour of the counting sequence of the investigated structures.Thus, we will have to distinguish between these two cases.
Definition 7 (auxiliary sequences (u i ) i 0 and (N i ) i 0 , [8, Def.6]).Let (u i ) i 0 be the integer sequence defined by In the last section we investigate the distribution of the different types of nodes in lambda-DAGs with bounded number of De Bruijn levels among the separate levels throughout the DAG.
Remark 8.Note that the De Bruijn level in which a node is located just counts the number of unary nodes in the branch connecting the root and the respective node.Theorem 9. Let (N i ) i 0 be the sequence defined in Definition 7 and choose integers k and j such that N j−1 < k N j .Moreover, let ρ k (u) be the root of smallest modulus of the function z → R j+1,k (z, u), where and let us define then the total number of leaves in random closed lambda-DAGs with at most k De Bruijn levels is asymptotically normally distributed with asymptotic mean µn and asymptotic variance σ 2 n, where µ = B (1) and Remark 10.The requirement B (1) + B (1) − B (1) 2 = 0 obviously results from the fact that otherwise the variance would be o(n) (cf. the variability condition in Hwang's Quasi-Powers Theorem [26]).However, this inequality seems to be very difficult to verify, since ρ k (u) and we do not know anything about the function ρ k (u), except for some crude bounds and its analyticity.But numerical data supports the conjecture that B (1) + B (1) − B (1) 2 = 0 always holds (cf.Table 1).
The following theorem includes the results that we will present in Section 5.1, where we show that the number of leaves near the root of the lambda-DAG, i.e., in the lower De Bruijn levels, is very low, while there are many leaves in the upper levels.Furthermore these two domains are strictly separated and the "separating level", i.e., the first level with many leaves, depends on the bound of the number of De Bruijn levels.We will show a very interesting behaviour, namely that with growing bound the number of leaves within the De Bruijn level that is directly below the critical separating level increases, until the bound reaches a certain number, which makes this adjacent leaf-filled level become the new separating level.Thus, we can observe a "double jump" in the asymptotic behaviour of the number of leaves within the separate levels (cf. Figure 4).Theorem 11.Let k−l Hk (z, u) denote the bivariate generating function of the class of closed lambda-terms with at most k De Bruijn levels, where z marks the size and u marks the number of leaves in the (k−l)-th De Bruijn level.Additionally, we denote its dominant singularity (i.e., the singularity of smallest modulus) by ρk (u), and B(u) = ρk (u) ρk (1) .Then the following assertions hold: , as the size n → ∞, while it is Θ(n) for the last j + 1 levels.
In particular, if B (1) + B (1) − B (1) 2 = 0, the number of leaves in each of the last j + 1 De Bruijn levels is asymptotically normally distributed with mean and variance proportional to the size n of the lambda-term.
• If k = N j , then the average number of leaves in the first k − j De Bruijn levels is O(1), as n → ∞, while the average number of leaves in the (k − j)-th level is Θ( √ n).The last j De Bruijn levels have asymptotically Θ(n) leaves.
In particular, if B (1)+ B (1)− B (1) 2 = 0, the number of leaves in each of the last j De Bruijn levels is asymptotically normally distributed with mean and variance proportional to the size n of the lambda-term..If k = N j , then both the average number of unary nodes and the average number of binary nodes in the first k − j De Bruijn levels is O(1), as n → ∞, while the average number of nodes of the respective type in the (k − j)-th De Bruijn level is Θ( √ n).The last j De Bruijn levels contain each asymptotically Θ(n) unary nodes, as well as Θ(n) binary nodes.

Total number of leaves in random lambda-terms with bounded De Bruijn indices
In this section we investigate the asymptotic number of all leaves in closed lambda-terms with bounded De Bruijn indices.In order to get some quantitative results concerning this restricted class of lambda-terms we will use the well-known symbolic method in conjunction with analytic combinatorics (see [20]), in particular singularity analysis of generating functions [19].Therefore we introduce further combinatorial classes as it has been done in [8]: Z denotes the class of atoms, A the class of application nodes (i.e., binary nodes), U the class of abstraction nodes (i.e., unary nodes), and P(i,k) the class of the electronic journal of combinatorics 26(4) (2019), #P4.47 unary-binary trees such that every leaf e can be labelled in min{ (e) + i, k} ways, where (v) denotes the de Bruijn level of v.The objects in P(i,k) may be seen as lambda-DAGs where the binding of each variable x may come from a binding unary node at most k De Bruijn levels above x, even if this means up to i De Bruijn levels above the root.Therefore, the class we are interested in is P(0,k) The classes P(i,k) can be specified by Translating into generating functions with z marking the size and u marking the number of leaves, we get which yields This can be written in the form Since the class P(0,k) is isomorphic to the class G k of closed lambda-terms where all De Bruijn indices are not larger than k, we get for the corresponding bivariate generating function 2z .
We will now perform a singularity analysis of these generating functions.Recall that we call a singularity dominant, if it is a singularity of smallest modulus.Notice further that the behaviour of a generating function near its dominant singularities determines the asymptotic behaviour of its coefficients.
From [8] we know that the dominant singularity of G k (z, 1) comes from the innermost radicand only and consequently is of type 1  2 .Due to continuity arguments this implies that in a sufficiently small neighbourhood of u = 1 the dominant singularity ρk (u) of G k (z, u) comes also only from the innermost radicand, i.e., R1,k (z, u), and is of type 1  2 .By calculating the smallest positive root of R1,k (z, u) we get ρk (u) = 1 1+2 √ ku .Now we will determine the expansions of the radicands in a neighbourhood of the dominant singularity ρk (u).
Let us briefly sketch our approach: We know that the singularity ρk (u) is determined by the equation R1,k (ρ k (u), u) = 0, provided that u is sufficiently close to 1, and that it is likewise the dominant singularity of G k (z, u) (seen as function of z, as all functions in the context right now).Consequently, it is the dominant singularity of all Rj,k (z, u), for j = 2, . . ., k + 1.If we determine the local behaviour of R1,k (z, u) near z = ρk (u), then we will be able to determine Puiseux expansions of all Rj,k (z, u) for j = 2, . . ., k + 1 at z = ρk (u).This will be done in Proposition 13.In particular, this gives us the Puiseux expansions of G k (z, u) from which we can derive the asymptotic behaviour of its coefficients by transfer theorems [20,19].This will be the task of Theorem 14 below.
It will then turn out that the shape of ρk (u) near u = 1 determines the characteristic function of the random variable "number of leaves", because ρk (u) depends on u in a nicely regular way.This characteristic function has then the shape of a so-called quasi-power involving the function ρk (u).Hwang's Quasi-Powers Theorem [26] will then do the job and yield a central limit theorem.
Proposition 13.Let ρk (u) be the root of the innermost radicand R1,k (z, u), i.e. ρk (u) = ku , where u is in a sufficiently small neighbourhood of 1, i.e. |u − 1| < δ for δ > 0 sufficiently small.Then the equations and Proof.Using the Taylor expansion of R1,k (z, u) around ρk (u) we obtain Per definition, the first summand R1,k (ρ k (u), u) is equal to zero.Setting z = ρk (u) − and using (1) we obtain the first claim of Proposition 13.
Observe that Expanding, using again ρk (u) = √ ku , and simplifying yields Setting c j+1 (u) := 4ju − 1 + 2 c j (u) and d j+1 (u) := for 2 j k, we obtain Finally, we show that the c l (u)'s are greater than zero in a neighbourhood of u = 1.By induction it can easily be seen that they are always positive for u = 1, since c 1 (1) = 1, and assuming c i−1 (1) < c i (1) we get Using continuity arguments we can see that the functions c l (u) have to be positive in a sufficiently small neighbourhood of u = 1 as well, which completes the proof of (4).
and (4), we get for Hence, The singularity ρk (u) = √ ku is of type 1 2 and if we plug into ( 5) and apply the standard transfer theorems (see [20,19]), we obtain the desired result.
From [8, Theorem 1] we know the following result: with c l (u) defined as in Proposition 13.Now we want to apply the well-known Quasi-Powers Theorem.
Theorem 15 (Quasi-Powers Theorem, [26]).Let X n be a sequence of random variables with the property that holds uniformly in a complex neighbourhood of u = 1, where λ n → ∞ and φ n → ∞, and A(u) and B(u) are analytic functions in a neighbourhood of u = 1 with Using Theorem 14 and (6), we get for n → ∞ the electronic journal of combinatorics 26(4) (2019), #P4.47 where c 1 (u) = 1 and c j (u) = 4ju − 4u − 1 + 2 c j−1 (u).Thus, all assumptions for the Quasi-Powers Theorem are fulfilled, and we get that the number of leaves in closed lambda-terms with De Bruijn indices at most k is asymptotically normally distributed with and therefore Theorem 4 is shown.
4 Total number of leaves in random lambda-terms with bounded number of De Bruijn levels This section is devoted to the enumeration of leaves in closed lambda-terms with a bounded number of De Bruijn levels.As in [8] let us denote by P (i,k) the class of unarybinary trees that contain at most k−i De Bruijn levels and each leaf e can be coloured with one out of i + l(e) colors, where l(e) denotes the De Bruijn level in which the respective leaf is located.These classes can be specified by By translating into generating functions we get Solving yields This can be written as where and the electronic journal of combinatorics 26(4) (2019), #P4.47 For the bivariate generating function of closed lambda-terms with at most k De Bruijn levels we get Thus, the generating function consists again of k + 1 nested radicals, but as stated in Section 2, the counting sequence of this class of lambda-terms shows a very unusual behaviour.The type of the dominant singularity of the generating function changes when the imposed bound equals N j .Thus, the subexponential term in the asymptotics of the counting sequence changes.The following result has been shown in [8]: Theorem 16 ([8, Theorem 3]).Let (u i ) i 0 and (N i ) i 0 be the integer sequences defined in Definition 7 and let H k (z, 1) be the generating function of the class of closed lambda-terms with at most k De Bruijn levels.Then the following asymptotic relations hold (i) If there exists j 0 such that N j < k < N j+1 , then there exists a constant h k such that (ii) If there exists j such that k = N j , then Thus, in order to investigate structural properties of this class of lambda-terms we perform a distinction of cases whether the bound k is an element of the sequence (N i ) i 0 or not.

The case N j < k < N j+1
From [8] we know that in this case the dominant singularity of the generating function H k (z, 1) comes from the (j + 1)-th radicand R j+1,k and is of type 1  2 .As in the previous section we can again use continuity arguments to guarantee that sufficiently close to u = 1 the dominant singularity ρ k (u) of H k (z, u) comes from the (j + 1)-th radicand R j+1,k (z, u) and is of type 1  2 .Now we will determine the expansions of the radicands in a neighbourhood of the dominant singularity.Proposition 17.Let ρ k (u) be the dominant singularity of H k (z, u), where u is in a sufficiently small neighbourhood of 1, i.e. |u − 1| < δ for δ > 0 sufficiently small.Then the expansions the electronic journal of combinatorics 26(4) (2019), #P4.47 hold for → 0 so that ∈ C \ R − , uniformly in u.
Proof.(i) The first equation (for i < j + 1) follows immediately by Taylor expansion around ρ k (u) and setting z = ρ k (u) − .
(ii) The equation for i = j + 1 follows analogously to the first case, knowing that R j+1,k (z, u) cancels for z = ρ k (u).
(iii) The next step is to expand R i,k (z, u) around ρ k (u) for i > j + 1.From the second claim of Proposition 17 and from the recurrence relation (8 We set a j+2 (u We have just checked that it holds for i = j + 2. Now we perform the induction step i → i + 1.

Using the recursion (8) for R i,k and plugging in the expansion
for i j + 2 we obtain Expanding b i (u), using its recursive relation and b j+2 (u) = 2ρ k (u) γ j+1 (u) we get for i > j + 1 a l (u) .
We know that for sufficiently large i the sequence u i , defined in Definition 7, is given by u i = χ 2 i , with χ ≈ 1.36660956 . . .(see [8,Lemma 18]).Therefore we have N j ∼ u 2 j ∼ χ 2 j 2 and N j < k < N j+1 = O(N 2 j ), which gives j = Θ(log log k).This implies that j + 1 < k + 1, i.e., that the dominant singularity ρ k (u) cannot come from the outermost radical.
Remark 18. Obviously the same is true for the case k = N j .Thus, the dominant singularity never comes from the outermost radical.

Using Proposition 17 and H
which yields with Taking a look at the recursive definitions of a i (u) and b i (u) (see Proposition 17), it can easily be seen that these functions are not equal to zero in a sufficiently small neighbourhood of u = 1.We know that a j+2 (1) is positive, since [8]).By induction we can show that the sequence a i := a i (1) is monotonically increasing.Let us assume that a i−1 < a i , then we get It is obvious that if b j+2 := b j+2 (1) is non-zero, than all the b i 's, which are defined by are non-zero.In order to prove that b j+2 = 2ρ k (1) − ∂ ∂z R j+1,k (ρ k (1), 1) is non-zero, we also proceed by induction.Since we can see that ∂ ∂z R 1,k (ρ k (1), 1) < 0, and assuming ∂ ∂z R i,k (ρ k (1), 1) < 0 and using we proved that all b i 's are non-zero.Thus, we get that h k (u) = 0.
Using (9) and Theorem 16 we get for n → ∞ the electronic journal of combinatorics 26(4) (2019), #P4.47 Assuming that σ 2 = B (1) + B (1) − B (1) 2 = 0 with B(u) = ρ k (1) ρ k (u) we can apply the Quasi-Powers Theorem.As stated in Section 2 the proof of this assumption appears to be quite difficult, since there is only very little known about the function ρ k (u).However, it seems very likely that this condition will be fulfilled for arbitrary k ∈ (N j , N j+1 ), so that the Quasi-Powers Theorem can be applied and we get that the number of leaves in lambda-terms with bounded number of De Bruijn levels is asymptotically normally distributed with asymptotic mean µn and variance σ 2 n, respectively, where µ = B (1) and

The case k = N j
We know from [8] that in the case k = N j both radicands R j,k (z, 1) and R j+1,k (z, 1) vanish simultaneously and the dominant singularity is therefore of type 1 4 .This is not true for the radicands R j,k (z, u) and R j+1,k (z, u) when u is in a neighbourhood of 1.Thus, we have a discontinuity at ρ k (1), which is why we do not get any uniform expansions of the radicands in a neighbourhood of ρ k (1).
In order to overcome this problem we proceed as follows (see Figure 5 for a sketch of the idea of the proof): First, we show that the dominant singularity of the generating function H k (z, 1 + ) comes solely from the radicand R j,k (z, 1 + ) (cf.Lemma 19).Then we investigate the expansions of the radicands thoroughly for u = 1 + s √ n in a neighbourhood of z = ρ k (1) with radius t n , where s and t are both bounded complex numbers (cf.Lemma 20).This approach of choosing the considered neighbourhoods of z = ρ k (1) and u = 1 to be dependent on each other constitutes the main idea of the applied method.By use of Cauchy's coefficient formula we are then able to obtain an asymptotic expression for the n-th coefficient of the generating function by choosing a suitable integration contour (cf.Proposition 21).Finally, we show that the characteristic function of the random variable counting the total number of variables in a random lambda-term with at most k De Bruijn levels tends to the characteristic function of the normal distribution as the size tends to infinity (cf.Lemma 22).For convenience, we will subsequently use the abbreviation ρ k := ρ k (1).Proof.Setting u = 1 + , expanding ρ k (u) around 1 and plugging this expansion into the recursive definition of the radicands yields the electronic journal of combinatorics 26(4) (2019), #P4.47 Using this result and again the recursive definition of the radicands results in Thus, we see that |R j+1,k (ρ k (u), u)| |R j,k (ρ k (u), u)| in a neighbourhood of u = 1, which implies that the dominant singularity has to come from the j-th radicand, i.e.R j,k (ρ k (u), u) = 0 for u being sufficiently close to 1.
where f is an analytic function around 0; where Ĉi are constants and p i are analytic functions in the variables s and t.
Proof.We start with setting u = 1 + s √ n and z = ρ k (u)(1 + t n ) with bounded s, t ∈ C (cf. Figure 5), which results in where the radicand in the square root in the last bracket of both equations is of course also evaluated at n , but we will omit this notation from now on to ensure a simpler reading, i.e., subsequently we will write R i,k instead of the electronic journal of combinatorics 26(4) (2019), #P4.47 Ĉi p i (s, t) and p j+2 (s, t) = 2ρ k 2ρ k p j (t) + q(s), with a polynomial that is linear in s.Thus, p k+1 (s, t) = D • p j+2 (s, t) with a constant D. Inserting this into (13) and splitting the integral yields The first integral is zero and the third integral contributes O 1 √ n .Thus, the main part of the asymptotics results from the second integral: There are some constants A(s) and B(s) such that Here K denotes a suitable positive constant, and H denotes the classical Hankel curve, i.e., the noose-shaped curve that winds around 0 and starts and ends at +∞ (cf. Figure 6).
the electronic journal of combinatorics 26(4) (2019), #P4.47 Finally, using this result we get with a constant C(s) that depends on s.
Now we show that the characteristic function of our standardized sequence of random variables tends to the characteristic function of the normal distribution.
Lemma 22.Let X n be the total number of variables in a random lambda-term with at most k De Bruijn levels.Set Proof.For the standardised sequence of random variables Z n we have with µ Its characteristic function reads as .
From Proposition 21 we know , where the constant C(s) ∼ 1 for n → ∞. Thus, Since we know that the expected value of the standardised random variable is zero, we get µ = − ρ k (1) 2 , which completes the proof.Thus, we get that the total number of leaves in lambda-terms with a bounded number of De Bruijn levels is asymptotically normally distributed.

Leaves
The aim of this section is the investigation of the distribution of the number of leaves in the different De Bruijn levels in closed lambda-terms with bounded number of De Bruijn levels.In order to do so, observe that each De Bruijn level in such a lambda-term corresponds to one or more binary trees that contain different types of leaves, where the maximal number of types corresponds to the respective level (cf. Figure 3), i.e., in the i-th De Bruijn level there may be i different types of leaves.Let C be the class of binary trees.Using the notation from the previous sections we can specify this class by Translating into bivariate generating functions C(z, u) with z marking the size (i.e., the total number of nodes) and u marking the number of leaves, yields . Let k−l Hk (z, u) be the generating function of closed lambda-terms with at most k De Bruijn levels, where z marks the size and u marks the number of leaves on the (k − l)-th unary level (0 l k).The corresponding structures can be seen as k + 1 nested binary trees, matching the k + 1 De Bruijn levels.Each such level is a binary tree where some of its leaves are replaced by a unary node with a binary tree (belonging to the next level) attached to it.Thus, in the associated generating function, C(z, u), the variable marking the leaves is replaced by m + C(z, . ..),where m is the number of possible De Bruijn indices (i.e.types of leaves) and C(z, . ..) the generating function of the binary trees in the next level.Altogether, we have

.) . . .)).
This can be written as Remark 23.Note that the radicands Ri,k that are introduced above are very similar to the radicands R i,k that were used in the previous section.The only difference is that now we have a u only in the (l + 1)-th radicand, while in the previous case u was occurring in all radicands.Thus, from now on we will have further distinctions of cases now depending on the relative position (w.r.t.l) of the radicand(s) where the dominant singularity comes from.This subsection consists of two subsubsections.In the first part we will derive the mean values for the number of leaves in the different De Bruijn levels and the second part deals with the distributions of the number of leaves in these levels.

Mean values
Now we want to determine the mean for the number of leaves in the different De Bruijn levels, i.e.
where X n denotes the number of leaves in the (k − l)-th De Bruijn level of a random closed lambda-term of size n with at most k De Bruijn levels.
In order to do so, we start with the following observations: Therefore we get Ri,k (z, 1) Again we perform a distinction of cases starting with k not being an element of the sequence (N j ) j∈N .
Now we have to perform a distinction of cases whether the De Bruijn level that we are focusing on is below the (k − j)-th level or not (i.e., whether l is below j or not).
First case: l > j First let us remember that l > j implies that the u is inserted in a radicand that is located outside the (j + 1)-th.From ( 14) we get for → 0 so that the electronic journal of combinatorics 26(4) (2019), #P4.47 By denoting the sum in the equation above with δl we can determine the coefficient of z n asymptotically for n → ∞ by and by using the asymptotics of the n-th coefficient of k−l Hk (z, 1) = H k (z, 1) (see Theorem 16) we finally get for the mean, asymptotically as n → ∞, Thus we showed that there is only a small number of leaves in the De Bruijn levels below the (k − j)-th level.More precisely, the asymptotic mean of the number of leaves is O(1) for all these lower levels.Remark 24.The constant h k can be expressed by means of the sequences ãi and bi defined in Eqs. ( 15)-( 17), thereby enabling a representation of the constant C k,l := This term can be used in order to investigate the asymptotic number of leaves in the lower De Bruijn levels more thoroughly.
Second case: l j Similar to the first case we get By setting φj+1,l := Thus, we get for the mean, asymptotically as n → ∞, Hence, we proved that the asymptotic mean for the number of leaves in the De Bruijn levels above the (k − j)-th is Θ(n).So, altogether we can see that almost all of the leaves are located in the upper j + 1 De Bruijn levels.
As in the previous case, we set δl := k+1 ãi .Extracting the n-th coefficient and using the asymptotics of Thus, as in the previous case (k ∈ (N j , N j+1 )) the asymptotic mean for the number of leaves in the De Bruijn levels below the (k − j)-th level is O(1).
Second case: l = j Thus, the u is inserted in the (j + 1)-th radicand.In this case we get with The constant Dk,l := In order to get some information on the magnitude of this factor we would have to investigate γj = − ∂ ∂z Rj,k (ρ k , 1), which seems to get rather involved.However, taking a look at Equation (20) we can see that there are already considerably more unary nodes in the (k − j)-th De Bruijn level, namely Θ( √ n).
Third case: l < j The third case gives for n → ∞ where ψ j is defined as in (21).Thus, we proved that on average there are Θ(n) leaves in the upper j De Bruijn levels.
The constant Dk,l := can be rewritten as .
The following proposition sums up all the results that we obtained within this section.
Proposition 25.Let X n denote the number of leaves in the (k − l)-th De Bruijn level in a random lambda-term of size n with at most k De Bruijn levels.
If k ∈ (N j , N j+1 ), then we get for the asymptotic mean when n → ∞ • in the case l > j: • and in the case l j: with constants C k,l and Ck,l depending on l and k.
If k = N j , then the asymptotic mean for n → ∞ reads as • in the case l > j: • in the case l = j: , the electronic journal of combinatorics 26(4) (2019), #P4.47 • and in the case l < j: , with constants D k,l , Dk,l and Dk,l depending on l and k.
All the constants occurring in Proposition 25 have been calculated explicitly and can be obtained for every fixed k.In particular, we investigated D k,l in order to show that for large k the number of leaves in the De Bruijn levels that are closer to the root is smaller (cf. Figure 8).In fact, they rapidly tend to zero for k tending to infinity.
Proof.The proposition follows directly by investigating this constant D k,l .The asymptotics for the sequence λ i (defined by λ 0 = 0 and λ i+1 = i + 1 + √ λ i for i 0) can be obtained by bootstrapping (see [20]).We obtain λ i ∼ i, as i → ∞.
Remark 27.Note that the expression for D k,l (cf.Equ. ( 19)) can be obtained by plugging ãj+l = 4ρ 2 k λ l−1 into the equation for C k,l (cf.Equ. ( 18)).However, this relation is solely valid for the case k = N j and thus, Proposition 26 holds just for the constants D k,l .Nonetheless, we expect that by means of some suitable estimates for the ã i s one can obtain a similar behaviour for the constants C k,l .Since computations get rather involved, we omitted any further investigations of these constants within this paper.Anyway, we can conclude that in both cases, whether k is an element of (N i ) i>0 or not, a random closed lambda-term with at most k De Bruijn levels has very few leaves in its lowest levels if k is large.

Distributions
Now that we derived the mean values for the number of leaves in the different De Bruijn levels, we are interested in their distribution.Therefore we distinguish again between the cases of k being an element of the sequence (N i ) i 0 or not.
The case: N j < k < N j+1 We know that the generating function k−l Hk (z, u) consists of k + 1 nested radicals, where a u is inserted in the (l + 1)-th radicand counted from the innermost one.Additionally we know that for N j < k < N j+1 the dominant singularity ρk (u) comes from the (j + 1)-th radicand.Therefore, for l > j the function ρk (u) is independent of u, which is the reason why we do not get a quasi-power in that case.This can be rewritten to Thus, for the derivatives we get As in the previous section we distinguish between different cases.
where a i := ãi = a i (1) and b i := bi = b i (1) are defined in the previous sections and result from the expansions of the radicands.Thus, in this case the expected value of the number of unary nodes in the (k − l)-th De Bruijn level reads as Furthermore, the constant can be simplified to with the sequence λ i defined by λ 0 = 0 and λ i+1 = i + 1 + √ λ i for i 0. Since the second summand is almost zero for l being close to k and large k, this implies that the number of unary nodes in these levels (close to the root) is close to one for large k.

Second case: l
Thus, Hence, analogously to the number of leaves, we proved that the number of unary nodes on the upper j + 1 De Bruijn levels is Θ(n).

The case:
This case works analogously to the previous one.Thus, we just give the results for the expected values.
First case: l > j + 1 In this case, the expected value is entirely equal to the mean for the case k = N j and l > j + 1.So, with α l defined as in (23), we have for n → ∞ Second case: l = j + 1 In the second case, the constant differs a little bit, but the result stays qualitatively unaltered.We get Thus, the expected number of unary nodes in the last j + 1 De Bruijn levels is asymptotically Θ(n).

Binary nodes
In this section we want to calculate the mean values of the number of binary nodes in the different De Bruijn levels.We denote by C(z, v, u) the generating function of the class of binary trees where z marks the total number of nodes, v marks the number of binary nodes, and u marks the number of leaves.Thus, we have Using this generating function, we can write the bivariate generating function of the class of closed lambda-terms with z marking the size, and v marking the number of binary nodes on the (k − l)-th De Bruijn level as Analogously to the previous sections we have to distinguish between different cases.For the case k = N j and l > j + 1 we get for n → ∞ , as n → ∞.
We performed a thorough investigation of the constant and showed that it is almost zero, in case l is close to k + 1 and k is large" i.e., if we consider a very low De Bruijn level, that is close to the root.
Due to Equation (26) calculations get rather involved.Since the methods that are used are the same as in the previous section, we will omit further calculations.However, the results resemble the ones that we got in Section 5.1 for the number of leaves.The only difference appears in the constants, but qualitatively also these constants behave equally.

Conclusion
Our investigation was triggered by the striking observation that the asymptotic number of lambda-DAGs with bounded number of De Bruijn levels, say bounded by k, and n vertices is of the form ρ n n −3/2 except if the bound k belongs to some peculiar doubly exponentially the electronic journal of combinatorics 26(4) (2019), #P4.47 growing sequence.There was no apparent reason why bounding the number of De Bruijn levels by 8 is substantially different from setting the bound to 7 or 9.
The results in this paper showed that the vertices corresponding to the variables in the associated lambda-terms gather at the bottom of the lambda-DAG, meaning the De Bruijn levels of highest order within the lambda-DAG.Precisely, in each of the last n levels, where n = Θ(log log k), we find Θ(n) variables.The other levels contain only a bounded number of variables.As the bound grows, the higher levels become fuller and fuller and whenever k reaches a value that makes n jump to the next integer, a further De Bruijn level becomes populated with variables.In this stage, there are only Θ( √ n) variables, but for the next value of k this level gets densely populated with variables, just as the other levels of high order.This shows that there is a structural difference within the classes of lambda-terms with at most k De Bruijn levels, depending on whether the bound belongs to (N i ) i 0 or not.The distribution of the variables, in particular the fact that a further level has to contain a larger but still fairly small number of variables apparently has some slight affects on the degrees of freedom to choose the bindings which modifies the subexponential term in the asymptotics.

Definition 2 (
De Bruijn index, De Bruijn level).The natural numbers that represent the variables in the De Bruijn representation of a lambda-term are called De Bruijn indices.The number of nested lambdas starting from the outermost one specifies the De Bruijn level in which a variable (or De Bruijn index, respectively) is located.For example in the lambda-term λx.x(λy.(xy))= λ1(λ21) the first occurrence of the variable x (i.e., the leftmost 1 in the De Bruijn representation) is in the first De Bruijn level, while the other variables are in the second De Bruijn level.

Figure 4 :
Figure 4: Summary of the mean values of the number of leaves in the different De Bruijn levels in lambda-terms with at most k De Bruijn levels for the case N j < k < N j+1 (left), and the case k = N j (right).

Theorem 14 .
Let for any fixed k, G k (z, u) denote the bivariate generating function of the class of closed lambda-terms where all De Bruijn indices are at most k.Then the equation

Figure 5 :Lemma 19 .
Figure 5: Sketch of the idea of the proof.

Figure 7 :
Figure7: A schematic sketch of a lambda-term with at most k De Bruijn levels that exemplifies the notation that is used within this section: If we investigate the number of leaves in the (k − l)-th De Bruijn level, for 0 l k, a factor u is inserted in the recursive definition of the (l + 1)-th radicand.

Proposition 26 .
Let us consider a random closed lambda-term of size n with at most k De Bruijn levels and let us consider the case k = N j .Then the average number of leaves in De Bruijn level L, with 0

Figure 8 : ( 1 )
Figure 8:(1) In the (k − j)-th Be Bruijn level (l = j) are considerably more leaves than in the lower levels, but still less leaves then in the levels above.(2) With growing k the (k − j)-th Be Bruijn level gets filled with leaves, while the number of leaves in the next level below (i.e., the (k − j − 1)-th) slowly increases.(3) As soon as k reaches the next element of the sequence (N j ) j 0 , namely k = N j+1 the (k − j − 1)-th De Bruijn level immediately contains considerably more leaves than the levels below.

Table 1 :
Table summarizing the coefficients occurring in the variance and the mean for some initial values of k.