A new bijection between RNA secondary structures and plane trees and its consequences

In this paper, we first present a new bijection between RNA secondary structures and plane trees. Combined with the Schmitt-Waterman bijection between these objects, we then obtain a bijection on plane trees that relates the horizontal fiber decomposition associated to internal vertices to the degrees of odd-level vertices while the vertical path decomposition associated to leaves is related to the degrees of even-level vertices. To the best of our knowledge, only the former relation (i.e., horizontal vs odd-level) due to Deutsch is known. As a consequence, we obtain enumeration results for various classes of plane trees, e.g., refining the Narayana numbers and the enumeration involving young leaves due to Chen, Deutsch and Elizalde, and counting a newly introduced ‘vertical’ version of k-ary trees. The enumeration results can be also formulated in terms of RNA secondary structures with certain parameterized features, which might have some biological significance. Mathematics Subject Classifications: 05C05, 05A19, 05A15


Introduction
Ribonucleic acid (RNA) plays an important role in various biological processes within cells, ranging from catalytic activity to gene expression.RNA is described by its sequence of bases: A (adenine), U (uracil), G (guanine), and C (cytosine).These single-stranded molecules fold onto themselves forming helical structures, by forming base pairs where A pairs with U while G pairs with C. The sequence of bases of the RNA molecule is known as primary structure, and it is determined experimentally.A subset of the helical structure consistent with a planar graph is known as a secondary structure.
More than three decades ago, Waterman and his coworkers pioneered the combinatorics and prediction of RNA secondary structures [6][7][8][9][10].In particular, enumeration of the number of secondary structures over a sequence of length n that have k base pairs has been done in Schmitt and Waterman [6] by establishing a bijection between secondary structures and plane trees.
In this paper, we present a new bijection between RNA secondary structures and plane trees.Combining our new bijection and the Schmitt-Waterman bijection [6] leads to a new bijection ϕ on plane trees which enables us to obtain many interesting results.The most relevant studies on plane trees in the literature are as follows.In his paper [4], Deutsch presented an implicit, iteratively constructed bijection on plane trees which allowed him to show that the number of vertices of degree q > 0 and the number of odd-level vertices of degree q − 1 are equidistributed on the set of all plane trees.In particular, the case q = 1 implies that the number of plane trees with k leaves is the same as the number of plane trees with k even-level vertices, where the former is well known to be the Narayana number.In Chen, Deutsch and Elizalde [2], the authors classified leaves of a plane tree into old and young leaves, where a leaf is called old if it is the leftmost child of its parent and young otherwise, and they obtained enumerative results with respect to these bijectively.
In a plane tree, the horizontal elementary substructures are fibers associated to its internal vertices, i.e., internal vertices and their respective children.This horizontal fiber decomposition has been well understood through extensive studies of plane trees according to the number of internal vertices and their degree distribution.A dual perspective which appears to be ignored (at least less-studied) is that, vertically, a plane tree can be decomposed into paths associated to its leaves.Through our new bijection ϕ on plane trees, not only can we show the correspondence between horizontal fibers and odd-level vertices established by Deutsch [4], but we can also show that the vertical paths associated to leaves actually correspond to even-level vertices at the same time.Namely, we discover that the joint distribution of horizontal fibers and vertical paths is the same as the joint distribution of odd-and even-level vertices.
As a consequence, based on these equidistribution results, we can compute the number of plane trees with certain restricted path lengths in the vertical path decomposition (and with restriction on the horizontal fiber decomposition) via the multivariate Lagrange inversion formula, which refines the Narayana numbers and gives rise to some new results.For example, k-ary trees are plane trees with every horizontal fiber having a size k, which is known to be counted by certain generalized Catalan numbers, see, e.g., Chen [3].Here we are interested in their vertical duals, i.e., plane trees where any vertical path associated to a leaf somehow has a size k.We show that these 'vertical' k-ary trees are counted by numbers very similar to the generalized Catalan numbers.In addition, we observe that the lengths of the paths associated to leaves can enable us to distinguish between old and young leaves.Accordingly, we refine some results obtained in Chen, Deutsch and Elizalde [2].

A new bijection
We first recall the definition of RNA secondary structures.Let [n] = {1, 2, . . ., n}.An RNA secondary structure of length n is a simple graph with vertices in [n] and edges in the electronic journal of combinatorics 26(4) (2019), #P4.48

E satisfying
• if (i, j) ∈ E, then |i − j| 2; • if (i, j) ∈ E and (k, l) ∈ E, where i < j and k < l, and (where [i, j] denotes the interval {r : i r j}).
We typically draw an RNA secondary structure in the following manner: we place all vertices in a horizontal line and we draw an edge as an arc in the upper half-plane.Then, the second condition in the above definition guarantees that any two arcs do not cross.
The vertex of an arc with a smaller label is called the left-end of the arc, and a vertex not adjacent to any edge is called an isolated base.In addition, if (i, j) is an arc, we say that an arc (i 1 , j 1 ) (resp.an isolated base k) is covered by and we also say that the arcs (i, j) and (i 1 , j 1 ) nest with each other.
A plane tree T can be recursively defined as an unlabeled tree with one distinguished vertex v called the root of T , where the unlabeled trees T obtained by deleting v as well as its incident edges from T are linearly ordered, and T is a plane tree with the vertex adjacent to v in T as its root.In a plane tree T , the number of edges in the unique path from a vertex v to the root of T is called the level of v, and the vertices adjacent to v on a lower level are called the children of v.The vertices on level 2i for i 0 are called even-level vertices and the rest are called odd-level vertices.A vertex is called a leaf if it has no children, and is called an internal vertex otherwise.We will draw plane trees with the root on the top level, i.e., level 0, and with the children of a level i vertex arranged on level i + 1 left-to-right following their linear order.Theorem 1.There is a bijection φ between the set of RNA secondary structures of length 2a + k with k isolated bases and the set of plane trees with a + k edges and k even-level vertices.
Proof.Let R be an RNA secondary structure of length 2a + k with k isolated bases.We construct a plane tree φ(R) as follows: S1: Put a big arc covering all existing arcs and isolated bases of R and still refer to the obtained structure as R in the following.Label the isolated bases in R with b 1 , b 2 , . . ., b k left-to-right, and label the arcs with e 0 , e 1 , . . ., e a based on the left-toright order of their left-ends; S2: Start with a vertex that will be the root of φ(R) and label the vertex with b 1 , and generate k 1 children for b 1 if there are k 1 arcs covering the isolated base b 1 in R, where the children from left to right correspond to these k 1 arcs from the outermost to the innermost and are labeled correspondingly, respectively; S3: Set j = 2.While j k, put a new child to the left of all existing children of the vertex that corresponds to the innermost arc covering the isolated base b j in the current partially constructed tree and label the newly generated child with b j , and next generate k j children for the vertex b j if there are k j unused arcs (i.e., those with the electronic journal of combinatorics 26(4) (2019), #P4.48 labels not appearing in the current partial tree) covering the isolated base b j , where again the children from left to right correspond to these k j arcs from the outermost to the innermost and are labeled correspondingly, respectively, and set j = j + 1.
The following properties are observed in the above construction: (i) the vertices b i for all i are even-level vertices, and vice versa; (ii) the sequence e 0 e 1 • • • e a will be obtained if the children of the even-level vertices (in the order b will be obtained if the even-level vertices are searched by depth-first search from right to left.(i) and (ii) should be straightforward, and (iii) can be shown by induction.Hence, the labels of the vertices can be easily and uniquely recovered after being removed whence the obtained structure with labels removed is a plane tree with a + k edges.
Before we specify the reverse algorithm, we mention two additional properties in the above forward algorithm which are important to better understand the reverse algorithm to come: (iv) the number of children of an isolated base (as a vertex in φ(R)) is the number of left-ends of arcs between the present isolated base and the one immediately to the left of it if any; (v) the parent of an isolated base if any is the innermost arc, excluding those with the left-ends identified in (iv) if any, that covers the isolated base.
Let T be a plane tree with a + k edges and k even-level vertices.It is not hard to verify that we can construct φ −1 (T ) following the steps below: SS3: Set j = 2.While j k, place an isolated base with a label b j to the right of all existing isolated bases such that, (I) the newly placed isolated base is covered by the arc e t but not by e s for any s > t if the vertex b j is a child of the vertex e t in T , and (II) generate k j mutually nesting arcs to cover the isolated base b j if b j has k j children in T , (II1) without crossing with any existing arcs, as well as, (II2) without covering b j−1 , and label these k j arcs left-to-right correspondingly, and set j = j + 1. Suppose we have just completed all steps though j − 1.Then, it is clear that the left-ends of all already generated arcs are to the left of b j−1 .Next consider j.If the vertex b j is a child of the vertex e t in T , then the innermost arc, excluding those later added by (II), that covers b j should be e t .Thus the condition (I) is necessary.Next, if b j has k j children in T , then according to (iv), in φ −1 (T ), there should be k j arcs whose left-ends lie between b j and b j−1 .In order to guarantee this, we need to generate k j arcs to cover b j .The condition (II1) is clearly required to not violate the definition of secondary structures, while (II2) is essentially the same as (iv).
Finally, removing the arc e 0 as well as all e-labels and b-labels will give us a secondary structure φ −1 (T ).
the electronic journal of combinatorics 26(4) (2019), #P4.48 See Figure 1 for an illustration of the bijection φ.Remark 2. There are several variations of the bijection φ.For instance, we can put the covering arcs of an isolated base right-to-left as children, or we can put a newly generated child of an arc to the right of existing ones, or we can read the isolated bases from right to left, or different combinations of these.

Consequences
In this section, we present a number of applications by combining the bijection φ and the Schmitt-Waterman bijection [6].The Schmitt-Waterman bijection from RNA secondary structures to plane trees can be briefly summarized as follows: for a given RNA secondary structure, put a big arc covering everything.Next, view each arc and isolated base as a vertex in a tree rooted at the vertex corresponding to the big arc, where the left-to-right children of a vertex v in the tree are the vertices corresponding to the left-to-right arcs and isolated bases directly covered by v (if v is an arc).Therefore, the Schmitt-Waterman bijection maps an RNA secondary structure with k isolated bases to a plane tree with k leaves.Thus, combined with our bijection φ, with RNA secondary structures serving as intermediate objects, we immediately obtain the following well-known result [4].
Corollary 3. The number of plane trees with n edges and k leaves equals the number of plane trees with n edges and k even-level vertices.
Proof.Let f be the Schmitt-Waterman bijection from RNA secondary structures to plane trees.Clearly, ϕ = φ • f −1 gives a bijection from the set of plane trees with n edges and k leaves to the set of plane trees with n edges and k even-level vertices.The following corollary which is implied in [4] can be obtained as well.
Corollary 4. The bijection ϕ restricts to a bijection between the set of plane trees of n edges with a k-element multiset M as its outdegree distribution of the internal vertices and the set of plane trees of n edges having the multiset M = {z − 1 | z ∈ M} as the outdegree distribution of the odd-level vertices.
Proof.Let R be a secondary structure (with the big arc added).An arc e corresponds to an internal vertex in the plane tree f (R), and corresponds to an odd-level vertex in the plane tree φ(R).If e covers t disconnected components (here a component is either an isolated base or an arc and everything covered by the arc), then the outdegree of the corresponding vertex in f (R) is t.Note that, in each component, by definition of RNA secondary structure, there is at least one isolated base.By construction of φ(R), the arc e is a child of the vertex corresponding to the first (left-to-right) isolated base in the first component covered by e, while the first isolated base in each other component covered by e must be a child of e as a vertex in φ(R).Thus, the outdegree of the odd-level vertex corresponding to e in φ(R) is t − 1.The converse can be argued analogously, completing the proof.
Remark 5.The reader can check that the outcomes of the implicit, iterative bijection of Deutsch [4] are quite similar to those of our bijection ϕ.In fact, we believe that the former can be transformed into the latter by specifying further steps in the iterative construction there.However, our bijections in this paper are not motivated by revising the former bijection.Nevertheless, we believe that our bijection ϕ discovered in the study of RNA secondary structures is more explicit and more constructive.More importantly, the results in the rest of this paper are not discussed in [4].
By inspecting our bijections more carefully, we can obtain more properties on plane trees, which will be the main theme of the rest of the paper.Note that there is a unique path from a leaf to the root of a plane tree.Then, we can decompose a plane tree into a set of paths where each path has a leaf as a terminate vertex.The decomposition works as follows: suppose all leaves are ordered by their relative order in the depth-first search from left to right.The first path is the path from the first leaf to the root.For t > 1, the t-th path is the remaining part of the path from the t-th leaf to the root after the previously obtained paths are removed from the tree, or equivalently, the t-th path should go from the t-th leaf up to the first vertex that is already in a path that has been obtained.We refer to this decomposition as the vertical path decomposition associated to leaves.See Figure 3 (left) for an illustration.We will call the multiset consisting of the lengths of the obtained paths the path distribution of the given tree.Theorem 6.The bijection ϕ restricts to a bijection between the set of plane trees of n edges with a k-element multiset M as its path distribution and the set of plane trees of n edges having M as the degree distribution of the even-level vertices.
Proof.Let R be a secondary structure with an added big arc.In the Schmitt-Waterman bijection f , the path from a leaf to the root in f (R) consists of the leaf itself (an isolated base) and all arcs (including the added big arc) covering the isolated base.Thus, the first path is determined by the first isolated base b 1 and all arcs covering b 1 .So the length of the first path is the number of these arcs which equals the number of children (hence the degree) of b 1 in φ(R).It is not hard to see that the length of the i-th (i > 1) path is the number one larger than the number of 'unused' arcs covering the i-th leaf after the first i − 1 paths have been obtained in the decomposition process.Thus, the length of the i-th path is one larger than the number of children of b i in φ(R) which is the degree of b i in φ(R).The converse is also not hard to see whence the theorem.
Based on Corollary 4 and Theorem 6, we can conclude that, in a sense, the vertical determines the even-levels while the horizontal determines the odd-levels.Although the horizontal-odd relation is known, to the best of our knowledge, the two relations as a whole have not been addressed.We also remark that it seems not easy to motivate the vertical-even relation from Deutsch's bijection [4] due to its implicit, iterative nature.
Let T be a plane tree with k leaves.Let l t be the number one less than the length of the t-th path in the path decomposition of T for 1 < t k, and let l 1 be the length of the first path in the path decomposition.We denote the multiset consisting of these numbers l t (1 t k) as M(T ).With an application of the multivariate Lagrange inversion formula, we will obtain the forthcoming theorem.Let us first recall the following version of the multivariate (bivariate) Lagrange inversion formula [1,5] Then, the set of equations w i = t i f i (w 1 , w 2 ) for 1 i 2 uniquely determine the w i as formal power series in t 1 , t 2 , and where [t p 1 t q 2 ] denotes the coefficient of t p 1 t q 2 .
Theorem 7. The number C k,h (n) of plane trees T with n > 0 edges and k leaves such that max M(T ) h is given by Proof.Based on Theorem 6, the number of plane trees T with n edges and k leaves where max M(T ) h is equal to the number of plane trees of n edges with k even-level vertices such that every even-level vertex has at most h children.The latter can be computed as shown below: Given two sets E and O of vertices, we call a plane tree T on E O a set-alternating tree if vertices on any path starting from the root of T alternate in the two sets.Let where P E denotes the set of set-alternating plane trees with root in E and every E-vertex having at most h children while P O denotes the set of set-alternating plane trees with root in O and every E-vertex having at most h children.Then, it is obvious that the electronic journal of combinatorics 26(4) (2019), #P4.48 Clearly, the number of plane trees with n edges and k even-level vertices such that every even-level vertex has at most h children is the same as the number of set-alternating trees of n edges with root in E and with every E-vertex having at most h children, which is obviously ]w 1 .In terms of the above bivariate Lagrange inversion formula, we have The last quantity can be simplified into the electronic journal of combinatorics 26(4) (2019), #P4.48 where more detailed manipulations can be found in Appendix A. Note that there is at most one integer in the interval ( q−1 h+1 , q h+1 ] for any integer h 0. Specifically, if h + 1 | q, there exists exactly one integer m in the interval such that m(h + 1) = q.In this case, the sum of B2 and B3 is which can be merged into B1 by changing i q−1 h+1 into i q h+1 .If h + 1 q, there is no integer in that interval, thus B2 and B3 are both zero.Furthermore, any integer i q h+1 must satisfy i q−1 h+1 .Hence, changing i q−1 h+1 into i q h+1 in B1 makes no difference.Therefore, the sum of B1, B2 and B3 can be written in a unified form in all cases, which gives the quantity in the theorem after setting p = k, q = n + 1 − k.
For example, there are 14 plane trees T with five edges and three leaves such that max M(T ) 2, which are shown below: Note that for any plane tree T with n edges, we have max M(T ) n.Then, we immediately have Corollary 8.The number of plane trees with n edges and k leaves is given by the Narayana number Theorem 9.The number of plane trees T with n > 0 edges and k leaves such that l t = h for 1 t k is given by h n n k−1 if (h + 1)k = n + 1, and 0 otherwise.Proof.It is easy to see that n + 1 = (h + 1)k if l t = h for 1 t k.The remaining part can be shown analogously as Theorem 7. the electronic journal of combinatorics 26(4) (2019), #P4.48 An equivalent formulation of Theorem 9 is that the number of plane trees T with k leaves such that l t = h for 1 t k is given by h , which is very similar to the number 1 hk+1 hk+1 k of h-ary trees with k internal vertices (e.g., see [3]).So, to some extent, these trees can be viewed as 'h-ary' trees defined from another angle, i.e., vertically.
As a corollary of Theorem 7 and Theorem 9, we obtain a new curious identity below.
Corollary 10.For n 0, k 1, we have Proof.Note that a plane tree T with k leaves and max M(T ) h can have at most (h + 1)k vertices.Thus, C k,h (n) = 0 for n + 1 > (h + 1)k, which gives the first case.If the number of vertices n + 1 = (h + 1)k, then l t = h for 1 t k.Applying Theorem 7 and Theorem 9 gives the second case, completing the proof.Note that Eq. ( 2) can be rewritten as It might be interesting to find a direct combinatorial proof for the identity.The leaves of a plane tree are classified into old and young leaves in Chen, Deutsch and Elizalde [2]: a leaf is an old leaf if it is the leftmost child of its parent, and it is a young leaf otherwise.We can identify young and old leaves from the above path decomposition of a plane tree.
Lemma 11.In the path decomposition of a plane tree, a leaf contained in a length one path other than the first path is a young leaf, and vice versa.
Proof.Let T be a plane tree and v is an old leaf there.Suppose the parent of v is u 1 .We have the following two cases: (i) If u 1 is the root of T , then the path containing v is the first path and has length 1 since v is the leftmost child of the root; (ii) Otherwise, u 1 has a parent u 2 .By definition, v is the leftmost child of u 1 .Then, the path from v to the root is the 'leftmost' path containing the edge (u 1 , u 2 ).Thus, the edge (u 1 , u 2 ) can not be contained in any previous path in the path decomposition.So, the path containing v has length at least 2. In summary, an old leaf either induces a path of length at least two or the first path with a length one.Conversely, the first leaf is clearly always an old leaf regardless of the length of the first path.For t > 1, if the t-th path has a length at least two, then we can conclude the t-th leaf to be old by arguing analogously as the case (ii), whence the lemma.
Corollary 12.The number of plane trees with n edges, k leaves and i young leaves is the same as the number of plane trees with n edges and k even-level vertices where i of them are leaves.
Proof.Considering Theorem 6 and Lemma 11 together completes the proof.
Based on Corollary 12, we can compute the number of plane trees with restrictions on path lengths and the number of young leaves.
Theorem 13.The number C h,y,k (n) of plane trees T with n > 0 edges and k leaves such that max M(T ) h and y out of k leaves are young leaves is given by Proof.Let Ch,j (m) be the number of plane trees with m edges and j leaves such that 1 l t h for 1 t j.We first show that This can be seen as follows: On the one hand, for each plane tree T with n edges and k leaves such that max M(T ) h and y out of k leaves are young leaves, if we delete the young leaves, we will obtain a plane tree with n − y edges and k − y leaves such that 1 l t h for 1 t k − y due to Lemma 11.On the other hand, for each plane tree of the latter case, inserting y leaves into the sectors other than the leftmost ones around these n + 1 − k internal vertices will generate a plane tree of the former case.There are 2(n − y) − (k − y) − (n + 1 − k) + 1 = n − y such sectors, which gives in total n−1 y different ways of inserting y leaves, whence we have the desired relation.
Next, based on Theorem 6, the number Ch,k−y (n − y) also counts plane trees T of n − y edges with k − y even-level vertices such that every even-level vertex has at least one child and at most h children.Employing an analogous computation as in Theorem 7, we obtain and the proof follows.
As an immediate consequence, we recover the following result obtained in [2].
Corollary 14.The number of plane trees with n > 0 edges, i old leaves and j young leaves is 1 the electronic journal of combinatorics 26(4) (2019), #P4.48 Proof.Obviously, the desired number is and the proof follows.
The number of plane trees with n edges and i old (resp.young) leaves can be obtained by summing over all possible j's (resp.i's) in Corollary 14, which can be found in Chen, Deutsch and Elizalde [2] as well.
Based on Theorem 6 and Corollary 4, we can also count plane trees with both vertical and horizontal restrictions.It should be noted that it is generally not possible to have a plane tree with every vertex having k children while every path (from the path decomposition) has exactly length k, i.e., in a sense being regular both 'horizontally' and 'vertically'.However, we can have a weaker version of these regular trees, called strong k-ary trees.A k-ary tree T is called strong if max M(T ) k. Applying an analogous computation as in Theorem 7, we obtain Theorem 15.The number of strong k-ary trees with n internal vertices is given by Finally, we remark that the computational results in this paper can be also formulated in terms of RNA secondary structures with certain parameterized features (similar to, e.g., hairpins and cloverleaves [10]), which might have some biological significance.For instance, the path distribution represents the distribution of the sizes of parallel base pairs (or arcs) 'induced' by isolated bases.

(C3)
Combining C4, D1 (i.e., C2 and C5) and C3 together, C4 will be cancelled, D1 will be cancelled except for the part B3, and the remaining part of C3 is (−1) i p i p + q − i(h + 1) p q [p + q − i(h + 1)](p − 1) .(D3) It is easy to see that C1 equals the term for i = 0 of D3.Thus, C1 and D3 give B1.In conclusion, we have arrived at the B-terms from the A-terms.

SS1:
Label the even-level vertices of T with b 1 , b 2 , . . ., b k respectively in the depth-first searching manner from right to left, and label the left-to-right children of even-level vertices arranged in the order b 1 b 2 • • • b k sequentially with e 0 , e 1 , . . ., e a ; SS2: On a horizontal line, start with an isolated base labeled with b 1 , cover b 1 with k 1 mutually nesting arcs if the vertex b 1 in T has k 1 children, and label these arcs based on the order of their left-ends left-to-right with e 0 , e 1 , . . ., e k 1 −1 ;

4 Figure 1 :
Figure 1: An RNA secondary structure (left) and the process of constructing its corresponding plane tree (right).

Figure 2
Figure 2 gives an example of the bijection ϕ.

Figure 2 :
Figure 2: Correspondence between a plane tree with 5 leaves and a plane tree with 5 even-level vertices.