The largest crossing number of tanglegrams

A tanglegram $\cal T$ consists of two rooted binary trees with the same number of leaves, and a perfect matching between the two leaf sets. In a layout, the tanglegrams is drawn with the leaves on two parallel lines, the trees on either side of the strip created by these lines are drawn as plane trees, and the perfect matching is drawn in straight line segments inside the strip. The tanglegram crossing number ${\rm cr}({\cal T})$ of $\cal T$ is the smallest number of crossings of pairs of matching edges, over all possible layouts of $\cal T$. The size of the tanglegram is the number of matching edges, say $n$. An earlier paper showed that the maximum of the tanglegram crossing number of size $n$ tanglegrams is $<\frac{1}{2}\binom{n}{2}$; but is at least $\frac{1}{2}\binom{n}{2}-\frac{n^{3/2}-n}{2}$ for infinitely many $n$. Now we make better bounds: the maximum crossing number of a size $n$ tanglegram is at most $ \frac{1}{2}\binom{n}{2}-\frac{n}{4}$, but for infinitely many $n$, at least $\frac{1}{2}\binom{n}{2}-\frac{n\log_2 n}{4}$. The problem shows analogy with the Unbalancing Lights Problem of Gale and Berlekamp.


introduction
A binary tree has a root vertex assumed to be a common ancestor of all other vertices, and each vertex either has two children or no children.A vertex with no children is a leaf, and a vertex with two children is an internal vertex.Note that this definition allows a single-vertex tree that is considered as both root and leaf to be a rooted binary tree.In an ordered binary tree an order of the two children is specified, for every vertex that has children.
A plane binary tree is a drawn ordered binary tree, without edge crossings, where the left-right order of subtrees in the drawing coincides with the order.The edges are drawn in straight line segments.It is easy to draw a plane binary tree in such a way that all the leaves are on a line, and all other vertices are in the same open halfplane.
A tanglegram T = (L, R, σ) is a graph that consists of a left binary tree L, a right binary tree R with the same number of leaves as L, and a perfect matching σ between the leaves of L and R. Two tanglegrams are considered identical, if there is a graph isomorphism between them fixing the root r of R and the root ρ of L. The size of a tanglegram is the number of leaves in L (or R).An abstract tanglegram layout of the tanglegram (L, R, σ) is given by turning the unordered trees L and R into ordered trees.Given an abstract tanglegram layout, an actual tanglegram layout consists of a left plane binary tree isomorphic (keeping order as well) to L with root r drawn in the halfplane x ≤ 0, having its leaves on the line x = 0, a right plane binary tree isomorphic (keeping order as well) to R with root ρ, drawn in the halfplane x ≥ 1, having its leaves on the line x = 1, and a perfect matching σ between their leaves drawn in straight line segments.(Isomorphism of ordered trees (plane trees) keeps the root and the order.) Our main concern about tanglegram layouts is the number of crossings between the matching edges.As it is determined by the abstract tanglegram layout, it is sufficient to focus on the abstract tanglegram layout to count crossings.
A switch on the abstract tanglegram layout (L, R, σ) is the following operation: select an internal vertex v of one of the two trees L and R and change the order of its two children.
It is easy to see that two abstract tanglegram layouts represent the same tanglegram if and only if a sequence of switches moves one abstract layout into the other.(A switch is on a tanglegram layout illustrated in Figure 1.) Hence tanglegrams of a given size partition the set of all abstract tanglegram layouts of the same size, or equivalently a tanglegram can be seen as an equivalence class of abstract tanglegram layouts.Note that interchanging L and R is not allowed, as it may result in a different tanglegram.The crossing number of a tanglegram layout is the number of pairs of matching edges that cross, which is determined by the abstract tanglegram layout.
It is desirable to draw a tanglegram with the least possible number of crossings, which is known as the Tanglegram Layout Problem [4,8].The (tanglegram) crossing number crt(T ) of a tanglegram T is defined as the minimum number of crossings among its layouts.The Tanglegram Layout Problem problem is NP-hard [2,4], but is Fixed Parameter Tractable [2,1].It does not allow constant factor approximation under the Unique Game Conjecture [2].Tanglegrams play a major role in phylogenetics, especially in the theory of cospeciation [6].For example, the first binary tree is the phylogenetic tree of the hosts, the second binary tree is the phylogenetic tree of their parasites (e.g., gopher and louse), and the matching connects the host with its parasite [5].The tanglegram crossing number has been related to the number of times parasites switched hosts [5], or, working with gene trees instead of phylogenetic trees, to the number of horizontal gene transfers ( [3], pp.204-206).Tanglegrams are well-studied objects in phylogenetics and computer science.
Let M n denote max T crt(T ) among size n tanglegrams.It is easy to see that for any tanglegram, the expected number of crossing in random layout of any fixed labeled tangegram of size n is The goal of this paper is to find more proper separation as , and for infinitely many n, In Section 1 we provide a construction for the lower bound, in Section 3 we relate the number of crossings in different layouts of the tanglegram.In Section 5 we relate the largest crossing number problem to the Unbalancing Lights Problem of Gale and Berlekamp, and show the separation from 1 2 n 2 .In Section 4 we derive some technical results that we need for the proof.

A construction for tanglegrams with large crossing number
Theorem 1.For every i ≥ 1, there exists a tanglegram of size 2 i , which has tanglegram crossing number 1 2 2 i 2 − i2 i−2 exactly.Let X = {0, 1} and let X i be the set of binary strings of length i, i.e., words over the alphabet X, which make the binary representations of the non-negative integers that are less than 2 i .Given a string x = x 1 x 2 . . .x i , we will denote by x the string obtained by reversing x, i.e., x = x i x i−1 . . .x 1 .
For every i ∈ N we will define a tanglegram T i = (R (i) , L (i) , σ i ) of size 2 i by the following procedure: Both L (i) and R (i) are the rooted complete binary trees of height i.We label the vertices of L (i) (resp. of R (i) ) as follows: The set of vertices at distance j (which we call the j th layer) from the root are labeled as u x (resp.w x ) where x is an element of X j .The root of L (i) ) is labeled as u ǫ , and the root of R (i) ) is labeled as v ǫ , where ǫ is the empty string.The labels of children of u x (resp.w x ) are created by suffices: u x0 and u x1 (resp.w x0 and w x1 ).The matching is 2 .Proof.Let T = (L, R, σ) be an arbitrary tanglegram, and v be a non-leaf vertex of one of the trees L, R. Let Z be the set of leaves in the tree, where v lives (L or R) that are descendants of v.Note that in any layout of T , the elements of Z appear consecutively in the sequence of leaves.Moveover, if both children of v are leaves, and the matching edges incident upon these children cross in the layout, then switching the order of these children in the layout eliminates this crossing and decreases the crossing number, so the original layout was not optimal.
Assume i ≥ 2, and let D be an optimal layout of T i .Let A be the number of crossings in D between edges incident upon leaves u x and u y where the first digits of x and y are the same and let B be the the number of crossings in D between edges incident upon leaves u x and u y where the first digit of x and y differ.Obviously, crt(T i ) = A + B. As for each t ∈ X the matching edges incident upon leaves of L (i) t induce a T i−1 with a sublayout in D with at least crt(T i−1 ) crossings, A ≥ 2 crt(T i−1 ), so it is enough to show that B ≥ 2 i−1 2 .Let t, s ∈ X be chosen such that u t lies above u s in the layout D. Let x, y ∈ X i−1 be different words.Clearly, w x0 , w x1 , w y0 , w y1 are distinct leaves of R (i) , and u 0 x , u 1 x , u 0 y , u 1 y are distinct leaves of L (i) .Also, the leaves w x0 , w x1 as well as w y0 , w y1 are consecutive in any layout, including D.
We may assume without loss of generality that u t x is above u t y in D. As u t lies above u s , both u s x , u s y lie below u t y .If the pair w x0 , w x1 lies above the pair w y0 , w y1 , then the matching edges incident upon u s y and u t x cross; otherwise the matching edges incident upon u t y and u s x cross.This shows that for any x, y ∈ X i−1 , if x = y, then for some k, ℓ such that {k, ℓ} = {0, 1} we have that the matching edges incident upon u k x and u ℓ y cross in D. Therefore we have B ≥ 2 i−1 2 .Claim 2. Let D ⋆ i be the layout of T i , in which the leaf labels from top to bottom appear in the order of the integers corresponding to the binary words, both in L (i) and R (i) .(See Fig. 2 for this layout.)Let cr(D ⋆ i ) denote the number of crossings in this layout.Then, for all i ∈ N, we have Proof.Set ω i = cr(D ⋆ i ).We will show the statement by induction on i, with base cases i ∈ {0, 1}.
T 0 and T 1 are the unique planar tanglegrams of size 1 and 2 respectively, i is a planar layout, so ω 0 = ω 1 = 0. Thus, the statement is true for i ∈ {0, 1}.Assume now i > 1, and consider the layout D ⋆ i .For each t ∈ X, the matching edges incident upon a leaf of L i t induce a drawing of a subtanglegram of T i−1 that is isomorphic to D ⋆ i−1 , contributing exactly 2ω i−1 crossings.We want to count the number of crossings in D ⋆ i between matching edges whose left-endpoints are u 0 x , u 1 y , where x, y ∈ X i−1 .The edges cross precisely when y1 < x0, which is equivalent with y < x (where we consider the words as binary representations of numbers).So we have exactly one such crossings for each unordered pair x, y from X i−1 .By the induction hypothesis and Claim 1 we have which gives crt(T i ) = ω i .Also, by the induction hypothesis Unfortunately, we know that this construction is not the best possible.For size 8, the tanglegram on Fig. 3 is shown with an optimal drawing and has one more crossings than our construction on Fig. 2.

Crossings in different layouts of the same tanglegram
Let us be given a tanglegram T = (R, L, σ) of size n.Vertices of R make a partially ordered set, for the following order: if r is the root of R, then x ≤ y, if y is a vertex of the unique rx path in T .This partial order is a semilattice, in which the least upper bound of vertices u and v is denoted by lca R (u, v) (lca stands for least common ancestor in phylogenetics).Similar arguments apply for the tree L, where the notation will be lca L .For e, f matching edges, lca R (e, f ) (resp.lca L (e, f )) will denote the lca of the two leaves of R adjacent to the edges e and f (resp.the lca of the two leaves of L adjacent to the edges e and f ).
Consider a layout D 0 of T .Assume that a layout D is obtained from D 0 by making a switch in certain internal (non-leaf) vertices of R and certain internal (non-leaf) vertices of L. Note that changing the order of switches make no effect on D. Also note that each of R, L has exactly n − 1 internal vertices.We denote the set of internal vertices by int(R) and int(L).
Define α (β) on int(R) (int(L)) as 1, if no switch takes place in the vertex, and −1, if a switch takes place in the vertex.Fixing D 0 , the combinatorially different layouts D are in one-to-one correspondence with (α, β) pairs of ±1 valued functions.
Consider now two matching edges, e, f of T .Let x = lca R (e, f ) and u = lca L (e, f ).Define the crossing status of matching edges e, f in layout D as Counting the number of crossings in a layout D, we have We have To justify the claim in Section 1 on the expected number of crossings in a random layout of a labeled tanglegram, select randomly and independently the α and β values to transform the fixed drawing D 0 to transform it into the random drawing D. The displayed formula above implies E[cr(D)] = 1 2 n 2 .

tools
For a rooted binary tree T , let L(T ) denote the set of its leaves.and set A(T ) be the set of internal vertices that have a leaf neighbor.For x ∈ A(T ), set ψ(x) = 1 if the number of leaves that are descendants of x is even, 0 otherwise, and let In words, in any rooted binary tree with n leaves, at least ⌊ n 4 ⌋ + 1 vertices have a leaf neighbor and an even number of leaf descendants.Proof.Note that h(1) = 0 = ⌊ 1 4 ⌋.Let n ≥ 2 and write n = 4q + r where q = ⌊ n 4 ⌋ and 0 ≤ r < q.We will show that h(n) = q + 1 by induction on q.
Let q > 0 and assume that the statement is true for all trees with n ′ leaves, where 2 ≤ n ′ < 4q.
Take a tree T on n vertices, and let T 1 , T 2 be the subtrees rooted at the two neighbors of the root.Without loss of generality 1 The first inequality is an equality iff (n is odd or (n is even and k = 1)), and the second inequality is an equality iff k = 1 and T 2 is a realizer.Thus, for an odd n, h(n) ≤ q + 1 is obtained by choosing a tree T such that T 1 is a single vertex and T 2 is a realizer with n − 1 leaves.

unbalancing lights
Alon and Spencer [9] contains the following theorem: Theorem 2. Let a ij = ±1 for 1 ≤ i, j ≤ n.Then there exists x i , y j = ±1, 1 ≤ i, j ≤ n, so that Alon and Spencer [9] gives an amusing interpretation of Theorem 2, which explains the title of this section: "Let an n × n array of lights be given, each either on (a ij = +1) or off (a ij = −1).Suppose for each row and each column there is a switch so that if the switch is pulled (x i = −1 for row i and y j = −1 for column j) all of the lights in that line are 'switched': on to off and off to on.Then for any initial configuration it is possible to perform switches so that the number of lights on minus the number of lights off is at least The difficulty is that now a xu may take other values than ±1, in fact, it is difficult to find many non-zero a xu terms.Therefore we were unable to utilize the probabilistic method.Assume that x ∈ R satisfies the Claim 3, i.e., it has a leaf neighbor and has an even number of leaf descendants.We are going to call such x ∈ int(R) internal vertices as special vertices, and denote the set of them with S. Let e be the leaf neighbor of the special vertex x and let f 1 , f 2 , ..., f 2k+1 be the matching edges at further leaf descendants of x.We have Let us be given an arbitrary tanglegram T of size n.Without loss of generality, the layout D 0 realizes the crossing number of T .Then for every x ∈ S, u∈int(L) a xu < 0, as this sum is non-zero, and if it was positive, a switch in x would yield a layout with strictly smaller number of crossings.
Consider now the following layout

Figure 1 .
Figure 1.Result of a switch operation.

2 n 2 . 2 n 2 . 2 n
(For details, see Section 3.) Therefore, M n ≤ 1 An earlier paper[10] made a slight improvent showing that equality cannot happen: M n < 1 also showed that for every n = 2 k , 1 2

Figure 2 .
Figure 2. The tanglegrams T i for i ∈ {0, 1, 2, 3}.The vertices are labeled with their indices as in the text and the tanglegrams are shown with a crossing-optimal layout.

Figure 3 .
Figure 3.A tanglegram of size 8 with tanglegram crossing number 9.This is the maximum tanglegram crossing number for size 8, found by brute force search.

D 1 : 2 − 2 x∈S
switch in all u ∈ int(L).It is easy to see that cr(D 1 ) = n cr(D 0 ).Now switch in layout D 1 at every x ∈ S vertex to obtain layout D 2 : cr(D 2 ) = cr(D 1 ) +