Chain Reduction Preserves the Unrooted Subtree Prune-and-Regraft Distance

Research paper by Chris Whidden, Frederick A. Matsen

Indexed on: 07 Nov '16Published on: 07 Nov '16Published in: arXiv - Computer Science - Discrete Mathematics


The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be com puted relatively efficiently between rooted trees using fixed-parameter-tractable algorithms, in the unrooted case previous algorithms are unable to compute distances larger than 7. One important tool for efficient computation in the rooted case is called chain reduction, which replaces an arbitrary chain of subtrees identical in both trees with a chain of three leaves. Whether chain reduction preserves SPR distance in the unrooted case has remained an open question since it was conjectured in 2001 by Allen and Steel, and was presented as a challenge question at the 2007 Isaac Newton Institute for Mathematical Sciences program on phylogenetics. In this paper we prove that chain reduction preserves the unrooted SPR distance. We do so by introducing a structure called a socket agreement forest that restricts edge modification to predetermined socket vertices, permitting detailed analysis and modification of SPR move sequences. This new chain reduction theorem reduces the unrooted distance problem to a linear size problem kernel, substantially improving on the previous best quadratic size kernel.