1 research outputs found
Calculating the Unrooted Subtree Prune-and-Regraft Distance
The subtree prune-and-regraft (SPR) distance metric is a fundamental way of
comparing evolutionary trees. It has wide-ranging applications, such as to
study lateral genetic transfer, viral recombination, and Markov chain Monte
Carlo phylogenetic inference. Although the rooted version of SPR distance can
be computed relatively efficiently between rooted trees using
fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF
formulation is known for the unrooted case. Correspondingly, previous
algorithms are unable to compute unrooted SPR distances larger than 7.
In this paper, we substantially advance understanding of and computational
algorithms for the unrooted SPR distance. First we identify four properties of
optimal SPR paths, each of which suggests that no MAF formulation exists in the
unrooted case. Then we introduce the replug distance, a new lower bound on the
unrooted SPR distance that is amenable to MAF methods, and give an efficient
fixed-parameter algorithm for calculating it. Finally, we develop a
"progressive A*" search algorithm using multiple heuristics, including the TBR
and replug distances, to exactly compute the unrooted SPR distance. Our
algorithm is nearly two orders of magnitude faster than previous methods on
small trees, and allows computation of unrooted SPR distances as large as 14 on
trees with 50 leaves.Comment: 21 double-column pages, 11 figures. Revised in response to peer
review. The sections introducing socket forests and on chain reduction were
spun off into a conference-length paper arXiv:1611.02351 to reduce the length
and complexity of the manuscrip