A Quantitative Study of Pure Parallel Processes

In this paper, we study the interleaving -- or pure merge -- operator that most often characterizes parallelism in concurrency theory. This operator is a principal cause of the so-called combinatorial explosion that makes very hard - at least from the point of view of computational complexity - the analysis of process behaviours e.g. by model-checking. The originality of our approach is to study this combinatorial explosion phenomenon on average, relying on advanced analytic combinatorics techniques. We study various measures that contribute to a better understanding of the process behaviours represented as plane rooted trees: the number of runs (corresponding to the width of the trees), the expected total size of the trees as well as their overall shape. Two practical outcomes of our quantitative study are also presented: (1) a linear-time algorithm to compute the probability of a concurrent run prefix, and (2) an efficient algorithm for uniform random sampling of concurrent runs. These provide interesting responses to the combinatorial explosion problem.


Introduction
A significant part of concurrency theory is built upon a simple interleaving operator named the pure merge in [BW90].The basic underlying idea is that two independent processes running in parallel, denoted P Q, can be faithfully simulated by the interleaving of their computations.We denote a.P (resp.b.Q) a sequential process that first executes an atomic action a (resp.b) and then continue as a process P (resp.Q).
The pure merge operator is a principal source of combinatorial explosion when analysing concurrent processes, e.g. by model checking [CGP99].This issue has been thoroughly investigated and many approaches have been proposed to counter the explosion phenomenon, in general based on compression and abstraction/reduction techniques.If several decidability and worst-case complexity results are known, to our knowledge the interleaving of process structures as computation trees has not been studied extensively from the average case point of view.
In analytic combinatorics, the closest related line of work address the shuffle of regular languages, generally on disjoint alphabets [FGT92, MZ08, GDG + 08, DPRS12].The shuffle on (disjoint) words can be seen as a specific case of the interleaving of processes (for processes of the form (a 1 . . .a n ) (b 1 . . .b m )).Interestingly, a quite related concept of interleaving of tree structures has been investigated in algebraic combinatorics [BFLR11], and specially in the context of partly commutative algebras [DHNT11].We see our work has a continuation of this line of works, now focusing on the quantitative and analytic aspects.
Our objective in this work is to better characterize the typical shape of concurrent process behaviours as computation trees and for this we rely heavily on analytic combinatorics techniques.The present paper also serves as an homage to Philippe Flajolet, our scientific mentor and guide, who is a central figure of analytic combinatorics.We think this work follows faithfully Philippe's practice of investigating concrete problems with advanced analytic tools.In the same spirit, we emphasize practical applications resulting from such thorough mathematical studies.In the present case, we develop algorithmic techniques to analyse probabilistically the process behaviours through counting and uniform random generation.
Our study is organized as follows.In Section 2 we define the recursive construction of the interleaved process behaviours from syntactic process trees, and study the basic structural properties of this construction.In Section 3 we investigate the number of concurrent runs that satisfy a given process specification.Based on an isomorphism with increasing treesthat proves particularly fruitful -we obtain very precise results.We then provide a precise characterization of what "exponential growth" means in the case of pure parallel processes.We also investigate the case of non-plane trees.In Section 4 we discuss, both theoretically and experimentally, the decomposition of semantic trees by level.This culminates with a rather precise characterization of the typical shape of process behaviours.We then study, in Section 5, the expected size of process behaviours.This typical measure is precisely characterized by a linear recurrence relation that we obtain in three distinct ways.While reaching the same conclusion, each of these three proofs provide a complementary view of the combinatorial objects under study.Taken together, they illustrate the richness and variety of analytic combinatorics techniques Section 6 is devoted to practical applications resulting from this quantitative study.First, we describe a simple algorithm to compute the probability of a run prefix in linear time.As a by-product, we obtain a very efficient way to calculate the number of linear extensions of a tree-like partial order or tree-poset.The second application is an efficient algorithm for the uniform random sampling of concurrent runs.These algorithms work directly on the syntax trees of process without requiring the explicit construction of their behaviour, thus avoiding the combinatorial explosion issue.This paper is an updated and extended version of [BGP12].It contains new material, especially the study of the typical shape of process behaviours in Section 4. The more complex setting of non-plane trees is also discussed.Appendix A was added to discuss the weighted random sampling in dynamic multisets.The proofs in this extended version are also more detailed.

A tree model for process semantics
As a starting point, we recast our problematic in combinatorial terms.The idea is to relate the syntactic domain of process specifications to the semantic domain (or model) of process behaviours.

Syntax trees
The grammar we adopt for pure parallel processes is very simple.The set of process specifications is the least set satisfying: • an atomic action, denoted a, b, . . . is a process, • the prefixing a.P of an action a and a process P is a process, • the composition P 1 . . .P n of a finite number of processes is a process.
An example of a valid specification is: which can be faithfully represented by a tree, namely a syntax tree, as depicted on the lefthand side of Figure 1.Such a tree can be read as a set of precedence constraints between atomic actions.Under these lights the action a at the root must be executed first and then b.There is no relation between c and d -they are said independent -and e, f may only happen after d.
In combinatorial terms we adopt the classical specification for plane rooted trees to represent the syntactic domain.The size of a tree is its total number of nodes.Note that we do not keep the names of actions in the process trees since they play no rôle for the pure merge operator.
Definition 1.The specification C = Z × Seq(C) represents the combinatorial class of plane rooted trees.
As a basic recall of analytic combinatorics and statement of our conventions, we remind that for such a combinatorial class C, we define its counting sequence C n consisting of the number of objects of C of size n.This sequence is linked to a formal power series Analogous writing conventions will be used for all combinatorial classes in this paper.
We remind the reader that in the case of class C the sequence C n corresponds to the Catalan numbers (indeed, shifted by one).For further reference, we give the generating function of C and the asymptotic approximations of the Catalan numbers (obtained by the Stirling formula approximation of n! as in e.g.[Com74, P. 267]): .

Semantic trees
The semantic domain we study is much less classical than syntax trees, although it is still composed of plane rooted trees.An example of a semantic tree is depicted on the righthand side of Figure 1.This tree represents all the possible executions -or runs -that may be observed for the process specified on the left.More precisely each branch of the semantic tree, e.g.a, b, c, d, e, f , is a concurrent run (or admissible computation) of the process, and all the branches share their common prefix.In the literature such structures are also called computation trees [CES86].
To describe the recursive construction of the semantic trees, we use an elementary operation of child contraction.Definition 3. Let T be a plane rooted tree and v 1 , . . ., v r be the root-labels of the children of the root.For i ∈ {1, . . ., r}, the i-contraction of T is the plane tree with root v i and whose children are, from left to right, T (v 1 ), . . ., T (v i−1 ), T (v i 1 ), . . ., T (v im ), T (v i+1 ), . . ., T (v r ) (where T (ν) denotes the sub-tree whose root is ν and v i 1 , . . ., v im are the root-labels of the children of T (v i )).We denote by T i the i-contraction of T .
Note that the root (here a) is replaced by the label of the root of the i-th child (here c).Now, the interleaving operation follows a straightforward recursive scheme.Definition 4. Let T be a process tree, then its semantic tree Sem(T ) is defined inductively as follows: Figure 2: Enumerating behaviours (semantic trees) from process specifications (syntax trees).
• if T has root t and r children (r ∈ N \ {0}), then Sem(T ) is the plane tree with root t and children, from left to right, Sem(T 1 ), . . ., Sem(T r ).
The mapping between the syntax trees on the one side, and the semantic trees on the other side is trivially one-to-one.Figure 2 depicts the enumeration of the first syntax trees (by size n) together with the corresponding semantic tree.
We note that the semantic trees are balanced, and even more importantly and that their height is n − 1, where n is the size of the associated process tree.This is obvious since each branch of a semantic tree corresponds to a complete traversal of the syntax tree.Thus there are as many semantic trees of height n − 1 as there are trees of size n (as counted by C n above).
A further basic observation is that the contraction operator (cf.Definition 3) ensures that the number of nodes at a given level of a semantic tree is lower than the number of nodes at the next level.Thus, the width of the semantic tree corresponds to its number of leaves.
The following observation bears witness to the high level of redundancy exhibited by semantic trees.
Proposition 5.The knowledge of a single branch of a semantic tree is sufficient to recover the corresponding syntax tree.
To go slightly further into the details, we may indeed exhibit a familiy of inverse functions from singled-out semantic tree branches to syntax trees.These inverse functions exploit the concept of a degree-sequence, defined as follows.Definition 6.A degree-sequence (u p ) p∈{1,...,n} is a sequence of non-negative integers of length n that satisfies: u n is the single term equal to 0.
The degrees of the nodes from the root to a leaf in any branch of a semantic tree is a degree-sequence.
Proposition 7. Let (u p ) be a degree-sequence of length n that is linked to the leftmost branch of a semantic tree S. Let us define the new sequence (v p ) such that: We build a tree T of size n such that the sequence (v n ) corresponds to the degrees of each node of the tree, ordered by the prefix traversal.The semantic image of T is the tree S.
An important remark is that we only considered the leftmost branch of the semantic tree to construct the corresponding degree-sequence, from which we recover the initial syntax tree.It is interesting to note that the leftmost branch of the semantic tree encodes a Lukasiewicz word which is directly related to the degree-sequence of the prefix traversal [FS09,.
We can show, in fact, that the initial tree can be recovered by considering any of the branches of its semantic tree, not just the leftmost one.Each branch corresponds to a degree sequence visiting the nodes of the initial tree by a specific traversal.For example, if the leftmost branch encodes the prefix traversal; the rightmost branch enumerates its mirror: the postfix traversal.Last but not least, the set of degree-sequences of length n is only of cardinality C n , so the semantic trees are highly symmetrical in that many branches must be defined by the same degree-sequence.

Enumeration of concurrent runs
Our quantitative study begins by measuring the number of concurrent runs of a process encoded as a syntax tree T .This measure in fact corresponds to the number of leaves -and thus the width -of the semantic tree Sem(T ).Given the exponential nature of the merge operator, measuring efficiently the dimensions of the concurrent systems under study is of a great practical interest.In a second step, we quantify precisely the exponential growth of the semantic trees, which provides a refined interpretation of the so-called combinatorial explosion phenomenon.Finally, we study the impact of characterizing commutativity for the merge operator.As a particularly notable fact, this section reveals a deep connection between increasingly labelled structures and concurrency theory.

An isomorphism with increasing trees
Our study begins by a simple observation that connects the pure merge operator to the set of linear extensions of tree-like partial orders or tree-posets [Atk90].
Definition 8. Let T be the syntax tree of a process, and A its set of actions.We define the poset (A, ≺) such that a ≺ b iff a is the label of a node that is the parent of a node with label b in T .The linear extensions of (A, ≺) is the set of all the strict orderings (A, <) that respect the partial ordering.
For example, the syntax tree depicted on the left of Figure 1 Proposition 9. Let T be the syntax tree of process, (A, ≺) the associated tree-poset.Then: • Each branch of Sem(T ) encodes a distinct strict ordering of A that respects (A, ≺), • If a strict ordering (A, <) respects (A, ≺) then it is encoded by a given branch of Sem(T ).This observation is quite trivial, and can be justified by the fact that each branch of Sem(T ) encodes a distinct traversal of T .For example in Figure 1 the leftmost branch of the semantic tree is the linear extension a < b < c < d < e < f , that indeed fulfills the tree-poset.
Under these new order-theoretic lights, we can exhibit a deep connection between the number of concurrent runs of a syntax tree T and the number of ways to label it in a strictly increasing way.Indeed, as already observed in [KMPW12], the linear extensions of tree-posets are in one-to-one correspondence with increasing trees [BFS92, Drm09].
Definition 10.An increasing tree is a labelled plane rooted tree such that the sequence of labels along any branch starting at the root is increasing.
For example, to label the tree of Figure 1, a would take the label 1, b then takes the label 2. Then the label of c must belong to {3, 4, 5, 6}, which would then it induces constraints on the other nodes.Finally, only 8 labelled trees are increasing trees among the 6! possible unconstrained labellings.
Increasing plane rooted trees satisfy the following specification (using the classical boxed product see [FS09, p. 139] for details): It is easy to obtain the coefficients of the associated exponential generating function G(z) (e.g. from [BFS92]): Fact 11.The number of increasing plane rooted trees of size n is From this we obtain our first significant measure.
Theorem 12.The mean number of concurrent runs induced by syntax trees of size n is: This result is obtained from Fact 11 by taking the average number of increasing trees of size n, and the asymptotics is based on Stirling's formula [FS09,p. 37].
A further information that will prove particularly useful is the number of increasing labellings for a given tree.This can be obtained by the famous hook-length formula [Knu98, p. 67]: Fact 13.The number T of increasing trees built on a plane rooted tree T is: Corollary 14.The number of concurrent runs of a syntax tree T is the number T .
We remark that the hook-length gives us "for free" a direct algorithm to compute the number of linear extensions of a tree-poset in linear time.This is clearly an improvement if compared to related algorithms, e.g.[Atk90].In Section 6 we discuss a slightly more general and more efficient algorithm that proves quite useful.

Analysis of growth
To analyse quantitatively the growth between the processes and their behaviours, we measure the average number of concurrent runs induced by large syntax trees of size n.Although the arithmetic mean given in Theorem 12 is the usual way to measure in average, in our case a more natural way is to compute the geometric mean since the ratio between these quantities is exponential.
Theorem 15.The geometric mean number of concurrent runs built on process trees of size n satisfies: This growth appears to us as less important than what we conjectured with the arithmetic mean, although it is still very large.For both means, the result is indeed quite far from the upper bound (n − 1)!.
Proof.To obtain the geometric mean, we first demonstrate the following equality: This is obtained in relation with Fact 13 using an induction on the size of T , based on the following observation: Now, consider the sequence (w T ) such that w T = log( T /|T |!).Using the hook-length formula, we obtain: Let u n be the cumulated value of w T for all trees of size n: u n = T,|T |=n w T .Its generating function satisfies: By symmetry of the trees R i , we get: where is the generating function enumerating all trees.We recognise r≥1 rC r−1 (z) = (1 − C(z)) −2 so consequently, In order to obtain the geometric mean width Ḡn , we first extract the n-th coefficient of this product and we apply the exponential function on the result.We then multiply by n! and take the C n -th root of the result.Finally Γn = Ḡn n .Now we develop an approximation for U (z) in order to compute the asymptotic of Ḡn .We recall n .We further denote by A(z) the following approximation of L(z): where γ is Euler's constant.By using the two first terms in the development of Catalan numbers (see Fact 2) and formulas [FS09, p. 388] for an approximation for A(z), we obtain Thus, Finally we conclude:

The case of non-plane trees
In classical concurrency theory, the pure merge operator often comes with commutativity laws, e.g.: P Q ≡ Q P .From a combinatorial point of view, the idea is to consider the syntax and semantic trees as non-plane (or unordered) rooted trees.
Thankfully the non-plane analogous of the Catalan number is well known (cf.[FS09, p. 475-477]): Fact 16.The specification of unlabelled non-plane rooted trees is T = Z × MSet T .The number T n of such trees of size n is: where η ∈ [1/4, 1/e] and approximately η ≈ 0.3383218 and γ ≈ 1.559490.
Compared to plane trees, no known closed form exists to characterize the symmetries involved in the non-plane case.One must indeed work with rather complex approximations.Luckily, the increasing variant on non-plane trees have been studied in the model of increasing Cayley trees [FS09, p. 526-527]: Fact 17.The specification of increasing non-plane rooted trees is I = Z Set(I).The number I n of such trees of size n is: Theorem 18.The mean number of concurrent runs built on non-plane syntax trees of size n is: where η and γ are introduced in Fact 16.
Of course, we obtain different approximations for the plane vs. non-plane case.The ratio Wn / Vn is equivalent to γ(2η) −n /(e √ πn), which means that although the exponential growths are not equivalent, the two asymptotic formulas have a similar nature.This comparison between plane and non-plane combinatorial structures is a recurring theme in combinatorics.It has often been pointed out that in most cases the asymptotics look very similar.Citing Flajolet and Sedgewick (cf.[FS09, p. 71-72]): "(some) universal law governs the singularities of simple tree generating functions, either plane or non-plane".
Our study echoes quite faithfully such an intuition.

Typical shape of process behaviours
Our goal in this section is to provide a more refined view of the process behaviours by studying the typical shape of the semantic trees.This study puts into light a new -and, we think, interesting -combinatorial class: the model of increasing admissible cuts (of plane trees).In the first part we recall the notion of admissible cuts and define their increasing variant.This naturally leads to a generalization of the hook length formula that enables the decomposition of a semantic tree by levels.Based on this construction, we study experimentally the level decomposition of semantic trees corresponding to syntax trees of a size 40 (which yields semantic trees with more than 10 28 nodes !).Finally, we discuss the mean number of nodes by level, which is obtained by counting increasing admissible cuts.This provides a fairly precise characterization of the typical shape for process behaviours.

Increasing admissible cuts
The notion of admissible cut has been already studied in algebraic combinatorics, see for example [CK98].The novelty here is the consideration of the increasingly labelled variant.
Definition 19.Let T be a tree of size n.An admissible cut of T of size k = n − i (0 ≤ i < n) is a tree obtained by starting with T and removing recursively i leaves from it.An increasing admissible cut of T of size k is an admissible cut of size k of T that is increasingly labelled.Figure 3 depicts the set of all admissible cuts for the syntax tree T of Figure 1.We remark that the tree T is itself an admissible cut of T .
To establish a link with increasing admissible cuts, we first make a simple albeit important observation.
Proposition 20.Let T be a syntax tree of size n.Any run prefix of length k (1 Proof.We proceed by finite induction on k.For k = 1 there is a single run prefix of length k = 1 with the root of T and the corresponding admissible cut is the root node, which only has one increasing labelling.Now suppose that the property holds for run prefixes of length k, 1 < k < n, let us show that it also holds for run prefixes of length k + 1.By hypothesis of induction, any run prefix σ k of length k is encoded by a given admissible cut of size k.Let us denote by S(σ k ) this admissible cut.Now, any prefix σ k+1 of length k + 1 is obtained by appending an action α to a prefix σ k of length k.For σ k+1 to be a valid prefix, α must corresponds to a node in T that is a direct child of one of the nodes of S(σ k ).Thus we obtain a unique S(σ k+1 ) as S(σ k ) completed by a single leaf α. a, b, c, d and a, b, d, c are encoded by both first admissible cuts of size 4 depicted on Figure 3.

For example the run prefixes
This result leads to a fundamental connection with increasing admissible cuts.
Proposition 21.Let T be a syntax tree of size n.The number of run prefixes of length k is the number of increasing labellings of the admissible cuts of T of size k.
Proof.This is obtained by a trivial order-theoretic argument.Each admissible cut is a treeposet and thus the number of run it encodes is the number of its linear extensions.
For example, there are three admissible cuts of size 4 in Figure 3.The first one admits two increasing labellings and the other ones have a single labelling.This gives 2 + 1 + 1 = 4 run prefixes of length 4 for the syntax tree T of Figure 1.Now, we observe that this is also the number of nodes at level 3 in the corresponding semantic tree.And this of course generalizes: the number of run prefixes of length k corresponds to the number of nodes at level k + 1 in the semantic tree.
From this we can characterize precisely the number of nodes by level thanks to a generalization of the hook-length formula.
Corollary 22.Let T be a process tree of size n.The number of nodes at level n − 1 − i (0 ≤ i < n) of Sem(T ) is: where S is the hook-length formula applied to the admissible cut S (cf.Fact 13).Moreover, the total number of nodes of Sem(T ) is:

Level decomposition: Experimental study
Before working an exact formula for the mean number of nodes by level, we can take advantage of Corollary 22 to compute the shape of some typical semantic trees.For this our experiments consists in generating uniformly at random some syntax trees (using our arbogen tool3 ) of size n for n not too small.Then we can compute n T as defined above by first listing all the admissible cuts of T .However we cannot take syntax tree with a size n very large, given the following result.In fact, an admissible cut is a root and a sequence of children that are either admissible cuts, or trees that corresponds to a branch of the original tree that has entirely disappeared.Consequently, M (z) satisfies the following equation that can be easily solve: The singularities of M (z) are 1/4 and 4/25: the latter one is the dominant.The generating function is analytic in a ∆-domain around 4/25 because of the square-root type of the dominant singularity.By using transfer lemmas [FS09, p. 392], we get the asymptotic behaviour.
As a consequence we must be particularly careful when computing the shape of a semantic tree in practice using our generalization of the hook-length formula for increasing admissible cuts.However, for syntax trees of a size n ≤ 40 we are able to compute the level decomposition within a couple of days using a fast computer4 .This must be compared to the mean width of these trees: W 0 40 > 1.48 • 10 36 !0 10 13 10 23 10 36 For the two syntax trees depicted on the left of Figure 4, the shape of the corresponding semantic tree is depicted on the right.We use a logarithmic scale of the horizontal axis so that the exponential fringes become lines.The semantic trees are of size larger than 8.74•10 28 for the one corresponding to the left process tree and 2.66 • 10 35 for the second one.These correspond to the two plain lines in the figure.The dashed lines correspond to the theoretical computation of the mean as explained in the next section.We can see that it is almost reached by the shape of the second tree.We also remark that both the trees have a semantic size that is smaller than the average.To analyse this particularity, we have sampled more than fifty typical process trees of size 40 (of course with uniform probability among all trees of size 40).The results are fairly interesting.All the shapes that we computed follow the same kind of curve as the average one.However, almost all process trees have a semantic-tree size that is much smaller than the average one, (≈ 4.06 • 10 36 ).Indeed, most of them have a size that belong to [10 28 , 10 35 ], and a single one has a size larger than the average (it is approximately twice as large).
This observation let us conjecture that a only a very few special syntax trees accounts for the largest increase of the semantic size.These are probably process trees whose nodes have a large arity.The simplest one is the process tree with a single internal node.In the case of size 40, its semantic correspondence has size larger than 2.03 • 10 46 .In fact, the combinatorial explosion in the worst case increases like a factorial function.Since the Catalan numbers (that count syntax trees) do not increase that quickly, the "worst" syntax trees (the one whose semantic-tree size is largest) do really influence the average measures.

Mean number of nodes by level
We may now describe one of the fundamental results of this paper: a close formula for the mean number of nodes at each level of a semantic tree.
Theorem 24.The mean number of nodes at level n − i − 1, for i ∈ {0, . . ., n − 1} in a semantic tree corresponding to a syntax tree of size n is: Proof.Let n, i be two integers such that 0 ≤ i < n. lated number W i n of nodes at level n − i − 1 in semantic trees issued of syntax trees of size n to be equal to the sum on all increasingly labelled admissible cuts of size n − i from syntax trees of size n.As in the previous section, let us denote by G the combinatorial class of increasing trees and thus G n−i the number of increasing trees of size n − i.An admissible cut is obtained from a tree by pruning some of its sub-trees.By the reverse process, i.e. by plugging sequences of trees to a fixed tree (that corresponds to an admissible cut), we obtain the set of trees which admit that admissible cut.On Figure 5, the fixed admissible cut is the tree with nodes a, b, c, d and the places where sequences of trees can be plugged are depicted by the grey triangles.For every node of arity η, exactly η + 1 sequences of trees can be plugged near its children.So for a fixed admissible cut of size n − i, the number of places is 2n − 2i − 1.Thus we conclude: From Definition 1 and Fact 11 we get The former result is obtained by using the "Bürmann's form" of Lagrange inversion.An analogous expression is given in [FS09, p. 66-68].Thus, By taking the average, i.e. by dividing by C n , we obtain the stated value for W i n .
Given this result, we can complete the analysis of the shapes depicted in Figure 4. Let us first determinate the limit curve for the shape of an average semantic tree.We renormalise the values W i n as f (c, n) = ln( W cn n ) and we evaluate an asymptotic of f (c, n) when n tends to infinity.An easy calculation shows that, for 1 n c 1 − 1 n , we have : In particular, as n tends to infinity, on every compact [a, b] such 0 < a < b < 1, the function in c, f (c, n) tends uniformly to a line (1 − c)n ln(n).Moreover, if we keep the second order terms, we then obtain a curve which is totally relevant with the Figure 4. Now, we are interested to the behaviour in the neighbourhood of the extremities of [0, 1].We can study the asymptotic of W i n for fixed constant i.A straightforward calculation shows that: Both give an interpretation to the inflexion of the curve near the extremities.

Expected size of process behaviours
In this section we study in more details the average size of semantic trees (i.e. the mean number of nodes).In a first part we provide a first approximation based on the Theorem 24 of the previous section.Then, we make a conjecture regarding the non-plane case, whose proof require a deeper study that goes beyond the scope of this paper.Finally, we characterize the average size in a more precise way, through a linear recurrence that is obtained by various means.As a kind of an homage to Philippe Flajolet, we describe three different techniques of analytic combinatorics to obtain this recurrence relation.Each technique has its pros and cons, as will be discussed below.Finally, we reach our goal of providing a precise asymptotics of the size of the semantic trees.

First approximation of mean size
Our initial approximation of the mean size of the semantic trees is based on the level decomposition of Section 4, where we give a closed formula for the mean width of nodes W i n at level n − i − 1 for semantic trees corresponding to syntax trees of size n, as Theorem 24.We first give, as a technical lemma, an inequality involving W i n .Lemma 25.
Proof.We first deal with some normalized expression of Theorem 24: .
Obviously, we get the stated lower bound 1.Let us go on with the simplification.The i/2 first factors of the numerator can be simplified with those of the denominator when j is even: But, the numerator is smaller than 1 and the denominator satisfies: So we obtain the stated upper bound.
Theorem 26.The mean size Sn of a semantic tree induced by a process tree of size n admits the following asymptotics: Proof.Using Lemma 25 and taking a large enough n, we get: First let us take the lower bound into account.Using an upper bound of the tail of the series (Taylor-Lagrange formula): Corollary 27.Let f be a function in n that tends to infinity with n.Asymptotically, in average, almost all nodes of the semantic tree induced by a syntax tree of size n belong to the f (n) last levels.
The proof is analogous as the previous one using the Taylor-Lagrange formula.The unique constraint for f is that it tends to infinity, but it can grow as slow as we want.For example, asymptotically almost all nodes of the average semantic tree belong to the log(. . .(log n) . ..) last levels.

The case of non-plane trees
In order to compute the average size in the context of non-plane trees, we need one more result that is the analogous of powers of the Catalan generating function ( see proof of Theorem 24).Here, in the case of non-plane trees this corresponds to the powers of unlabelled non-plane rooted trees.Although many results about forest of unlabelled non-plane trees have been studied in [PS79], it seems that the case of finite sequences of unlabelled non-plane trees has not been thoroughly investigated.
Conjecture 28.The mean size Sn of a semantic tree induced by a process (unlabelled nonplane rooted) tree of size n admits the following asymptotics: where η and γ are introduced in Fact 16.

The mean size as a linear recurrence
In this section, we focus on the asymptotics of the average size Sn of the semantic trees induced by syntax trees of size n.Our goal is to obtain more precise approximations than Theorem 26 using different analytic combinatorics techniques.Indeed, we present three distinct ways to establish our main result: a linear recurrence that precisely capture the desired quantity.These results are deeply related to the holonomy property of the generating functions into consideration.Thus, a priori, in the non-plane case, the functions are not holonomic and consequently such proofs could not be adapted.
We have stored the non-normalized version of this sequence in OEIS, at A216234.It consists to S n : the cumulated sizes of semantic trees issued of process trees of size n.

First proof of Theorem 29: Creative telescoping
The first proof is a direct consequence of the level decomposition detailed in Section 4. It is clearly the simplest of the three proofs both in terms of the technical mathematics involved and the level of computer assistance required for the demonstration.Among our various proofs, this is the lucky one because the level decomposition is really a peculiarity of the combinatorial structure we investigate, and it is hardly a common situation.
From the exact formula for the mean number of nodes each level, W i n , by summing on all levels, we get the mean number Sn of nodes on an average semantic tree: This sum can be expressed in terms of hypergeometric functions: Now, by using the package Mgfun of Maple [Chy98], we extract, by creative telescoping [PWZ96, Zei90], the stated P-recurrence for Sn .

Second proof of Theorem 29: Multivariate holonomic functions
The second proof is much more involved in terms of analytic combinatorics.Indeed it is based on multivariate holonomy theory and some of the steps in the proofs must be performed using a computer algebra system.It was our original proof in [BGP12] and it is clearly the proof that conveys the most combinatorial informations about the structures we study.Quite subjectively, this is the proof that we find beautiful.
As for the expected number of admissible cuts (cf.Observation 23), we are going to deal with specifications and generating functions.First of all, let us explain the following bivariate (in Z, U) specification for M: U ×Z ×Seq(M∪C).The tag Z (resp.U) marks the nodes of the tree carrying the admissible cut (resp.the nodes of the admissible cut).Now, observing Corollary 22, we see that we are not interested in admissible cuts but in increasing admissible cuts.So, let us consider the specification S = U U Z × Seq(S ∪ C) where the boxed product operates on U. Its associated generating function S(z, u) verifies S(z, u) = By substituting u k by k! in the series S(z, u), we obtain the generating function S(z) for the cumulated sizes of the semantic trees.That is to say, S(z) = n∈N S n z n where S n = T ;|T |=n S adm.cut of T S .This substitution can be done using a gamma transformation.So, we finally obtain: At this stage, we want to obtain a more tractable characterisation for S(z).We proceed according to the following plan which is based on multivariate holonomic closure [Lip89]: (i) As S(z, u) is algebraic, we may characterize explicitly the minimal polynomial P ∈ C[z, u][X] such that P (S(z, u)) = 0.
(ii) As algebraic functions are holonomic, one can give a partial differential equation for S(z, u) using P .
(iv) Using the holonomic stability under partial evaluation, a differential equation for S(z) is obtained.
This calculation has been done with a computer algebra system (see [SZ94], the package Gfun of Maple) and produced very huge equations (including hundreds of terms).However following this technical but conceptually straightforward calculation, we finally reach a relatively simple linear differential equation for S(z): with the initial conditions R(0) = 0, R (0) = 1, R (0) = 4.The coefficients R n follow the P-recurrence: with R 0 = 0, R 1 = 1 and R 2 = 2. Now, we can easily prove that this recurrence is convergent.Indeed, the recurrence is non-negative and asymptotically decreasing, just by observing that implies that for n sufficiently large the difference is always negative.
Theorem 26 shows that the series converge to exp(1).Now, a deeper analysis of this recurrence can be done using the tools described in [FS09,.Indeed, the singularities are regular.
Another way consists in predicting that the asymptotic expansion of R(z) as z tends to the infinity can be expressed as exp(2z + a ln (z) + bz −1 + cz −2 + dz −3 + O(z −4 )) and to use saddle point analysis to conclude.

Applications
We describe in this section two practical outcomes of our quantitative study of the pure merge operator.First, we present an algorithm to efficiently compute the uniform probability of a concurrent run prefix.The second application is a uniform random sampler of concurrent runs.These algorithms work directly on the syntax trees without requiring the explicit construction of the semantic trees.An important remark is that these algorithms continue to apply whether we consider the plane or the non-plane case, only the average quantities are impacted.

Probability of a run prefix
We first describe an algorithm to determinate the probability of a concurrent run prefix (i.e. the prefix of a branch in a semantic tree).In practice, this algorithm can be used to guide a search in the state space of process behaviours, e.g. for statistical model checking or (uniform) random testing.
As a matter of fact, Algorithm 1 provides us for free a way to compute from a syntax tree T the number of concurrent runs T in the corresponding semantic tree.For this we simply have to compute the probability of a full run σ (we might select an arbitrary traversal of T ) and then we obtain T = 1/ρ σ .
From an order-theoretic point of view we thus we obtain as a by-product a linear-time algorithm to compute the number of linear extensions of a tree-like partial order.
Corollary 37. Let T a tree-like partial order of size n.The number of its linear extensions can be computed in O(n).
Since any full run has length the size n of the syntax tree, the upper-bound O(n) is trivially obtained.Moreover, we conjecture that the problem has Ω(n) lower-bound also.Note that the hook length formula also yields a linear-time algorithm but with more arithmetic operations.To put into a broader perspective this result, we remind the reader that the problem of counting linear extensions of partial orders is #P -complete [BW91] in the general case.Moreover, the proposed solution (obtained thanks to the very fruitful isomorphism with increasing trees) is clearly an improvement if compared to the quadratic algorithm proposed in [Atk90].

Random generation of concurrent runs
The uniform random generation of concurrent runs is of great practical interest.The problem has a trivial solution if we work on a semantic tree.Since all runs have equal probability, we may simply select a leaf at random, and reconstruct the full run by climbing the unique branch from the selected leaf to the root of the tree.Of course, this naive algorithm is highly impractical given the exponential size of the semantic tree.The challenge, thus, is to find a solution which does not require the explicit construction of the semantic tree.A possible way would be to rely on a Markov Chain Monte Carlo (MCMC) approach, e.g. based on [Hub06].We describe here a simpler, more direct approach that yields a more efficient sub-quadratic algorithm.
The main idea is to sample in a multiset containing the nodes of the syntax trees as elements, each one associated to a weight corresponding to the size of the sub-tree rooted at this node.A particularly efficient way to implement the required multiset structure is to use a partial sum tree, i.e. a balanced binary search tree in which all operations (adding or removing a node) are done in logarithmic time.The details of this implementation can be found in Appendix A.
Let T be a process tree.First by one traversal, we add a label to all nodes of T that corresponds to the size of the sub-tree rooted in that node.We say that this size corresponds to the weight of each node.We build a list σ, at the end of size n, such that at each step i, we add one action to σ that corresponds to the i-th action in our random run.To choose this i-th action, we sample in the multiset of all possible actions available in the considered step.Initially only the root is available (with probability 1 thus cardinality n the size of the process tree T ).Then it is added to σ and removed from the multiset.Finally its children are enabled with the weight as cardinality.And we proceed until all actions have been sampled.
Let T a syntax tree, we denote by child(T ) the nodes at level one of T .
The following loop invariant derives easily from Algorithm 2. Thus by Proposition 32 the prefix σ p+1 is obtained with the correct probability so that when completed the full run σ is generated with uniform probability.
In the case of the partial sum tree implementation, we have the following complexity results.
Proposition 40.Let n be the size of the weighted process tree T .To obtain a random execution, we need n random choices of integers and the operations on the multiset are of order Θ(n log n) (for the worst case).

Conclusion and perspectives
The quantitative study of the pure merge operator represents a preliminary step towards our goal of investigating concurrency theory from the analysis of algorithms point of view.In the next step, we shall address other typical constructs of formalisms for concurrency, especially non-deterministic choice and synchronization [AI07].There are indeed various forms of synchronization, in general corresponding to reflecting the action labels within the pure merge operator.Other operators, such as hiding, also deserve further investigations.We also wish to further investigate the case of non-plane process trees.Although the nature of the operators does not seem to be really impacted (confirming the intuition of Flajolet and Sedgewick), the technical aspects in terms of analytic combinatorics are quite interesting.Another interesting continuation of the work would be to study the compaction of the semantic trees by identifying common sub-trees.This would amount to study the interleaving of process trees up-to bisimilarity, the natural notion of equivalence for concurrent processes.Note that our algorithmic framework would not be affected by such studies, since they do not require the explicit construction of the semantic trees (whether compacted or not, plane or non-plane).
Perhaps the most significant outcome of our study is the emergence of a deep connection between concurrent processes and increasing labelling of combinatorial structures.We indeed connected the pure merge operator with increasing trees to measure the number of concurrent runs.We also define the notion of increasing admissible cut to study the number of nodes by level in the semantic trees.We expect the discovery of similar increasingly labelled structures while we go deeper into concurrency theory.
From a broader perspective, we definitely see an interest in reinterpreting semantic objects (from logic, programming language theory, concurrency theory, etc.) under the lights of analytic combinatorics tools.Such objects (like semantic trees) may be quite intricate when considered as combinatorial classes, thus requiring non-trivial techniques.This is highlighted here e.g. by the generalized hook-length formula characterizing the expected size of semantic trees.Conversely, we think it is interesting to know -precisely, not just by intuition -the high-level of sharing and symmetry within semantic trees.This naturally leads to practical algorithms, making us confident that real-world applications (in our case, especially related to random testing and statistical model-checking) might result from such study.

Figure 1 :
Figure 1: A syntax tree (left) and the corresponding semantic tree (right)

Figure 3 :
Figure 3: The admissible cuts of the syntax tree of Figure 1.

Observation 23 .
The mean number mn of admissible cuts of trees of size n satisfies: Let us denote by M (z) the ordinary generating function enumerating the multiset of admissible cuts of all trees.More precisely, we get M (z) = n∈N M n z n where M n = T ;|T |=n S adm.cut of T 1.The tag Z marks the nodes of the tree carrying the admissible cut.The generating function C(z) enumerates all trees.The specification of M is Z ×Seq(M∪C).

Figure 4 :
Figure 4: Typical trees of size 40 and their semantic-tree profile behind the average profile.

Figure 5 :
Figure 5: An admissible cut and the places where it can be enriched.
Part 3, Chapter 5] on powers of the Catalan generating function give: (z,v)−C(z) dv.So, S(z, u) = n,k∈N S n,k z n u k /k!where S n,k is T ;|T |=n S adm.cut of size k of T S .Now, this differential equation can easily be solved:

Algorithm 2 :
uniform random generation of concurrent runs Data: T : a weighted process tree of size n Result: σ: a run (a list of nodes) σ := M := { {a |T | } } # initialize a multiset with the root a with its weight for p from 1 to |T | − 1 do α := sample(M ) # sample an action α according to its weight in the multiset σ := σ.α # append the sampled action to the sequence M := update(M, α, 0) # α cannot be sampled anymore for β ∈ child(T σ ) do M := update(M, β, |T (β)|) # insert the children of α in the multiset return σ Invariant 38.At the p-th step of the algorithm, we have: |M p | = |T | − p + 1 and M p = {α p+1 | α p+1 ∈ child(T σp )}.Proposition 39.Let σ p the prefix obtained at the p-th step in algorithm 2. The next action α p+1 is chosen with probability |T (α p+1 )|/(|T | − p + 1).Consequently the complete run σ is generated with uniform probability.Proof.Let M p the multiset obtained at step p in algorithm 2. We select the next action α p+1 with probability Mp(α p+1 ) |Mp| (cf.Appendix A for a detailed proof).By Invariant 38 we have |M p | = |T | − p + 1.Moreover, in the algorithm we insert α p+1 with weight M p (α p+1 ) = |T (α p+1 )|.