Encodings of Cladograms and Labeled Trees

  • Daniel J. Ford

Abstract

This paper deals with several bijections between cladograms and perfect matchings. The first of these is due to Diaconis and Holmes. The second is a modification of the Diaconis-Holmes matching which makes deletion of the largest labeled leaf correspond to gluing together the last two points in the perfect matching. The third is an entirely new encoding of cladograms, first as a bijection with a certain set of strings and then via this to perfect matchings. In this pair of bijections, deletion of the largest labeled leaf corresponds to deletion of the corresponding symbols from the string or deletion of the corresponding pair from the matching. These two new bijections are related through a common max-min labeling of internal vertices with two different choices for the label of the root node. All these encodings are extended to cladograms with edge lengths and left-right ordered children. Moving a single symbol in this last encoding corresponds to a subtree prune and regraft operation on the cladogram, making it well suited for use in phylogentics software. Finally, a perfect Gray code for cladograms is derived from the bar encoding, along with a total ordering on all cladograms, Algorithms are also provided for finding the next and previous cladogram, the cladogram at any position, and the position of any cladogram in the sequence.

Published
2010-03-29
Article Number
R54