Distribution of Segment Lengths in Genome Rearrangements

Glenn Tesler

doi:10.37236/829

Glenn Tesler

Abstract

The study of gene orders for constructing phylogenetic trees was introduced by Dobzhansky and Sturtevant in 1938. Different genomes may have homologous genes arranged in different orders. In the early 1990s, Sankoff and colleagues modelled this as ordinary (unsigned) permutations on a set of numbered genes $1,2,\ldots,n$, with biological events such as inversions modelled as operations on the permutations. Signed permutations may be used when the relative strands of the genes are known, and "circular permutations" may be used for circular genomes. We use combinatorial methods (generating functions, commutative and noncommutative formal power series, asymptotics, recursions, and enumeration formulas) to study the distributions of the number and lengths of conserved segments of genes between two or more unichromosomal genomes, including signed and unsigned genomes, and linear and circular genomes. This generalizes classical work on permutations from the 1940s–60s by Wolfowitz, Kaplansky, Riordan, Abramson, and Moser, who studied decompositions of permutations into strips of ascending or descending consecutive numbers. In our setting, their work corresponds to comparison of two unsigned genomes (known gene orders, unknown gene orientations). Maple software implementing our formulas is available at http://www.math.ucsd.edu/$\sim$gptesler/strips.