103 research outputs found
Calculating the Unrooted Subtree Prune-and-Regraft Distance
The subtree prune-and-regraft (SPR) distance metric is a fundamental way of
comparing evolutionary trees. It has wide-ranging applications, such as to
study lateral genetic transfer, viral recombination, and Markov chain Monte
Carlo phylogenetic inference. Although the rooted version of SPR distance can
be computed relatively efficiently between rooted trees using
fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF
formulation is known for the unrooted case. Correspondingly, previous
algorithms are unable to compute unrooted SPR distances larger than 7.
In this paper, we substantially advance understanding of and computational
algorithms for the unrooted SPR distance. First we identify four properties of
optimal SPR paths, each of which suggests that no MAF formulation exists in the
unrooted case. Then we introduce the replug distance, a new lower bound on the
unrooted SPR distance that is amenable to MAF methods, and give an efficient
fixed-parameter algorithm for calculating it. Finally, we develop a
"progressive A*" search algorithm using multiple heuristics, including the TBR
and replug distances, to exactly compute the unrooted SPR distance. Our
algorithm is nearly two orders of magnitude faster than previous methods on
small trees, and allows computation of unrooted SPR distances as large as 14 on
trees with 50 leaves.Comment: 21 double-column pages, 11 figures. Revised in response to peer
review. The sections introducing socket forests and on chain reduction were
spun off into a conference-length paper arXiv:1611.02351 to reduce the length
and complexity of the manuscrip
The agreement distance of unrooted phylogenetic networks
A rearrangement operation makes a small graph-theoretical change to a
phylogenetic network to transform it into another one. For unrooted
phylogenetic trees and networks, popular rearrangement operations are tree
bisection and reconnection (TBR) and prune and regraft (PR) (called subtree
prune and regraft (SPR) on trees). Each of these operations induces a metric on
the sets of phylogenetic trees and networks. The TBR-distance between two
unrooted phylogenetic trees and can be characterised by a maximum
agreement forest, that is, a forest with a minimum number of components that
covers both and in a certain way. This characterisation has
facilitated the development of fixed-parameter tractable algorithms and
approximation algorithms. Here, we introduce maximum agreement graphs as a
generalisations of maximum agreement forests for phylogenetic networks. While
the agreement distance -- the metric induced by maximum agreement graphs --
does not characterise the TBR-distance of two networks, we show that it still
provides constant-factor bounds on the TBR-distance. We find similar results
for PR in terms of maximum endpoint agreement graphs.Comment: 23 pages, 13 figures, final journal versio
Tanglegrams: a reduction tool for mathematical phylogenetics
Many discrete mathematics problems in phylogenetics are defined in terms of
the relative labeling of pairs of leaf-labeled trees. These relative labelings
are naturally formalized as tanglegrams, which have previously been an object
of study in coevolutionary analysis. Although there has been considerable work
on planar drawings of tanglegrams, they have not been fully explored as
combinatorial objects until recently. In this paper, we describe how many
discrete mathematical questions on trees "factor" through a problem on
tanglegrams, and how understanding that factoring can simplify analysis.
Depending on the problem, it may be useful to consider a unordered version of
tanglegrams, and/or their unrooted counterparts. For all of these definitions,
we show how the isomorphism types of tanglegrams can be understood in terms of
double cosets of the symmetric group, and we investigate their automorphisms.
Understanding tanglegrams better will isolate the distinct problems on
leaf-labeled pairs of trees and reveal natural symmetries of spaces associated
with such problems
Extremal Distances for Subtree Transfer Operations in Binary Trees
Three standard subtree transfer operations for binary trees, used in
particular for phylogenetic trees, are: tree bisection and reconnection
(), subtree prune and regraft () and rooted subtree prune and regraft
(). For a pair of leaf-labelled binary trees with leaves, the maximum
number of such moves required to transform one into the other is
, extending a result of Ding, Grunewald and Humphries. We
show that if the pair is chosen uniformly at random, then the expected number
of moves required to transfer one into the other is . These
results may be phrased in terms of agreement forests: we also give extensions
for more than two binary trees.Comment: 16 page
On the Subnet Prune and Regraft Distance
Phylogenetic networks are rooted directed acyclic graphs that represent
evolutionary relationships between species whose past includes reticulation
events such as hybridisation and horizontal gene transfer. To search the space
of phylogenetic networks, the popular tree rearrangement operation rooted
subtree prune and regraft (rSPR) was recently generalised to phylogenetic
networks. This new operation - called subnet prune and regraft (SNPR) - induces
a metric on the space of all phylogenetic networks as well as on several
widely-used network classes. In this paper, we investigate several problems
that arise in the context of computing the SNPR-distance. For a phylogenetic
tree and a phylogenetic network , we show how this distance can be
computed by considering the set of trees that are embedded in and then use
this result to characterise the SNPR-distance between and in terms of
agreement forests. Furthermore, we analyse properties of shortest
SNPR-sequences between two phylogenetic networks and , and answer the
question whether or not any of the classes of tree-child, reticulation-visible,
or tree-based networks isometrically embeds into the class of all phylogenetic
networks under SNPR
Efficiently Inferring Pairwise Subtree Prune-and-Regraft Adjacencies between Phylogenetic Trees
We develop a time-optimal -time algorithm to construct the subtree
prune-regraft (SPR) graph on a collection of m phylogenetic trees with n
leaves. This improves on the previous bound of . Such graphs are used
to better understand the behaviour of phylogenetic methods and recommend
parameter choices and diagnostic criteria. The limiting factor in these
analyses has been the difficulty in constructing such graphs for large numbers
of trees. We also develop the first efficient algorithms for constructing the
nearest-neighbor interchange (NNI) and tree bisection-and-reconnection (TBR)
graphsComment: 21 pages, 3 figures. Revised in response to peer revie
Ricci-Ollivier Curvature of the Rooted Phylogenetic Subtree-Prune-Regraft Graph
Statistical phylogenetic inference methods use tree rearrangement operations
to perform either hill-climbing local search or Markov chain Monte Carlo across
tree topologies. The canonical class of such moves are the
subtree-prune-regraft (SPR) moves that remove a subtree and reattach it
somewhere else via the cut edge of the subtree. Phylogenetic trees and such
moves naturally form the vertices and edges of a graph, such that tree search
algorithms perform a (potentially stochastic) traversal of this SPR graph.
Despite the centrality of such graphs in phylogenetic inference, rather little
is known about their large-scale properties. In this paper we learn about the
rooted-tree version of the graph, known as the rSPR graph, by calculating the
Ricci-Ollivier curvature for pairs of vertices in the rSPR graph with respect
to two simple random walks on the rSPR graph. By proving theorems and direct
calculation with novel algorithms, we find a remarkable diversity of different
curvatures on the rSPR graph for pairs of vertices separated by the same
distance. We confirm using simulation that degree and curvature have the
expected impact on mean access time distributions, demonstrating relevance of
these curvature results to stochastic tree search. This indicates significant
structure of the rSPR graph beyond that which was previously understood in
terms of pairwise distances and vertex degrees; a greater understanding of
curvature could ultimately lead to improved strategies for tree search.Comment: 17 2-column pages, 6 figures, 2 tables. To appear in the Proceedings
of the Thirteenth Workshop on Analytic Algorithmics and Combinatorics
(ANALCO
On the Maximum Parsimony distance between phylogenetic trees
Within the field of phylogenetics there is great interest in distance
measures to quantify the dissimilarity of two trees. Here, based on an idea of
Bruen and Bryant, we propose and analyze a new distance measure: the Maximum
Parsimony (MP) distance. This is based on the difference of the parsimony
scores of a single character on both trees under consideration, and the goal is
to find the character which maximizes this difference. In this article we show
that this new distance is a metric and provides a lower bound to the well-known
Subtree Prune and Regraft (SPR) distance. We also show that to compute the MP
distance it is sufficient to consider only characters that are convex on one of
the trees, and prove several additional structural properties of the distance.
On the complexity side, we prove that calculating the MP distance is in general
NP-hard, and identify an interesting island of tractability in which the
distance can be calculated in polynomial time.Comment: 30 pages, 6 figure
constNJ: an algorithm to reconstruct sets of phylogenetic trees satisfying pairwise topological constraints
This paper introduces constNJ, the first algorithm for phylogenetic
reconstruction of sets of trees with constrained pairwise rooted subtree-prune
regraft (rSPR) distance. We are motivated by the problem of constructing sets
of trees which must fit into a recombination, hybridization, or similar
network. Rather than first finding a set of trees which are optimal according
to a phylogenetic criterion (e.g. likelihood or parsimony) and then attempting
to fit them into a network, constNJ estimates the trees while enforcing
specified rSPR distance constraints. The primary input for constNJ is a
collection of distance matrices derived from sequence blocks which are assumed
to have evolved in a tree-like manner, such as blocks of an alignment which do
not contain any recombination breakpoints. The other input is a set of rSPR
constraints for any set of pairs of trees. ConstNJ is consistent and a strict
generalization of the neighbor-joining algorithm; it uses the new notion of
"maximum agreement partitions" to assure that the resulting trees satisfy the
given rSPR distance constraints.Comment: Please contact me with any questions or comments
Parsimony via concensus
The parsimony score of a character on a tree equals the number of state
changes required to fit that character onto the tree. We show that for
unordered, reversible characters this score equals the number of tree
rearrangements required to fit the tree onto the character. We discuss
implications of this connection for the debate over the use of consensus trees
or total evidence, and show how it provides a link between incongruence of
characters and recombination.Comment: Final published version of articl
- …