103 research outputs found

    Calculating the Unrooted Subtree Prune-and-Regraft Distance

    Full text link
    The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be computed relatively efficiently between rooted trees using fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF formulation is known for the unrooted case. Correspondingly, previous algorithms are unable to compute unrooted SPR distances larger than 7. In this paper, we substantially advance understanding of and computational algorithms for the unrooted SPR distance. First we identify four properties of optimal SPR paths, each of which suggests that no MAF formulation exists in the unrooted case. Then we introduce the replug distance, a new lower bound on the unrooted SPR distance that is amenable to MAF methods, and give an efficient fixed-parameter algorithm for calculating it. Finally, we develop a "progressive A*" search algorithm using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance. Our algorithm is nearly two orders of magnitude faster than previous methods on small trees, and allows computation of unrooted SPR distances as large as 14 on trees with 50 leaves.Comment: 21 double-column pages, 11 figures. Revised in response to peer review. The sections introducing socket forests and on chain reduction were spun off into a conference-length paper arXiv:1611.02351 to reduce the length and complexity of the manuscrip

    The agreement distance of unrooted phylogenetic networks

    Full text link
    A rearrangement operation makes a small graph-theoretical change to a phylogenetic network to transform it into another one. For unrooted phylogenetic trees and networks, popular rearrangement operations are tree bisection and reconnection (TBR) and prune and regraft (PR) (called subtree prune and regraft (SPR) on trees). Each of these operations induces a metric on the sets of phylogenetic trees and networks. The TBR-distance between two unrooted phylogenetic trees TT and T′T' can be characterised by a maximum agreement forest, that is, a forest with a minimum number of components that covers both TT and T′T' in a certain way. This characterisation has facilitated the development of fixed-parameter tractable algorithms and approximation algorithms. Here, we introduce maximum agreement graphs as a generalisations of maximum agreement forests for phylogenetic networks. While the agreement distance -- the metric induced by maximum agreement graphs -- does not characterise the TBR-distance of two networks, we show that it still provides constant-factor bounds on the TBR-distance. We find similar results for PR in terms of maximum endpoint agreement graphs.Comment: 23 pages, 13 figures, final journal versio

    Tanglegrams: a reduction tool for mathematical phylogenetics

    Full text link
    Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairs of leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees "factor" through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems

    Extremal Distances for Subtree Transfer Operations in Binary Trees

    Full text link
    Three standard subtree transfer operations for binary trees, used in particular for phylogenetic trees, are: tree bisection and reconnection (TBRTBR), subtree prune and regraft (SPRSPR) and rooted subtree prune and regraft (rSPRrSPR). For a pair of leaf-labelled binary trees with nn leaves, the maximum number of such moves required to transform one into the other is n−Θ(n)n-\Theta(\sqrt{n}), extending a result of Ding, Grunewald and Humphries. We show that if the pair is chosen uniformly at random, then the expected number of moves required to transfer one into the other is n−Θ(n2/3)n-\Theta(n^{2/3}). These results may be phrased in terms of agreement forests: we also give extensions for more than two binary trees.Comment: 16 page

    On the Subnet Prune and Regraft Distance

    Full text link
    Phylogenetic networks are rooted directed acyclic graphs that represent evolutionary relationships between species whose past includes reticulation events such as hybridisation and horizontal gene transfer. To search the space of phylogenetic networks, the popular tree rearrangement operation rooted subtree prune and regraft (rSPR) was recently generalised to phylogenetic networks. This new operation - called subnet prune and regraft (SNPR) - induces a metric on the space of all phylogenetic networks as well as on several widely-used network classes. In this paper, we investigate several problems that arise in the context of computing the SNPR-distance. For a phylogenetic tree TT and a phylogenetic network NN, we show how this distance can be computed by considering the set of trees that are embedded in NN and then use this result to characterise the SNPR-distance between TT and NN in terms of agreement forests. Furthermore, we analyse properties of shortest SNPR-sequences between two phylogenetic networks NN and N′N', and answer the question whether or not any of the classes of tree-child, reticulation-visible, or tree-based networks isometrically embeds into the class of all phylogenetic networks under SNPR

    Efficiently Inferring Pairwise Subtree Prune-and-Regraft Adjacencies between Phylogenetic Trees

    Full text link
    We develop a time-optimal O(mn2)O(mn^2)-time algorithm to construct the subtree prune-regraft (SPR) graph on a collection of m phylogenetic trees with n leaves. This improves on the previous bound of O(mn3)O(mn^3). Such graphs are used to better understand the behaviour of phylogenetic methods and recommend parameter choices and diagnostic criteria. The limiting factor in these analyses has been the difficulty in constructing such graphs for large numbers of trees. We also develop the first efficient algorithms for constructing the nearest-neighbor interchange (NNI) and tree bisection-and-reconnection (TBR) graphsComment: 21 pages, 3 figures. Revised in response to peer revie

    Ricci-Ollivier Curvature of the Rooted Phylogenetic Subtree-Prune-Regraft Graph

    Full text link
    Statistical phylogenetic inference methods use tree rearrangement operations to perform either hill-climbing local search or Markov chain Monte Carlo across tree topologies. The canonical class of such moves are the subtree-prune-regraft (SPR) moves that remove a subtree and reattach it somewhere else via the cut edge of the subtree. Phylogenetic trees and such moves naturally form the vertices and edges of a graph, such that tree search algorithms perform a (potentially stochastic) traversal of this SPR graph. Despite the centrality of such graphs in phylogenetic inference, rather little is known about their large-scale properties. In this paper we learn about the rooted-tree version of the graph, known as the rSPR graph, by calculating the Ricci-Ollivier curvature for pairs of vertices in the rSPR graph with respect to two simple random walks on the rSPR graph. By proving theorems and direct calculation with novel algorithms, we find a remarkable diversity of different curvatures on the rSPR graph for pairs of vertices separated by the same distance. We confirm using simulation that degree and curvature have the expected impact on mean access time distributions, demonstrating relevance of these curvature results to stochastic tree search. This indicates significant structure of the rSPR graph beyond that which was previously understood in terms of pairwise distances and vertex degrees; a greater understanding of curvature could ultimately lead to improved strategies for tree search.Comment: 17 2-column pages, 6 figures, 2 tables. To appear in the Proceedings of the Thirteenth Workshop on Analytic Algorithmics and Combinatorics (ANALCO

    On the Maximum Parsimony distance between phylogenetic trees

    Full text link
    Within the field of phylogenetics there is great interest in distance measures to quantify the dissimilarity of two trees. Here, based on an idea of Bruen and Bryant, we propose and analyze a new distance measure: the Maximum Parsimony (MP) distance. This is based on the difference of the parsimony scores of a single character on both trees under consideration, and the goal is to find the character which maximizes this difference. In this article we show that this new distance is a metric and provides a lower bound to the well-known Subtree Prune and Regraft (SPR) distance. We also show that to compute the MP distance it is sufficient to consider only characters that are convex on one of the trees, and prove several additional structural properties of the distance. On the complexity side, we prove that calculating the MP distance is in general NP-hard, and identify an interesting island of tractability in which the distance can be calculated in polynomial time.Comment: 30 pages, 6 figure

    constNJ: an algorithm to reconstruct sets of phylogenetic trees satisfying pairwise topological constraints

    Full text link
    This paper introduces constNJ, the first algorithm for phylogenetic reconstruction of sets of trees with constrained pairwise rooted subtree-prune regraft (rSPR) distance. We are motivated by the problem of constructing sets of trees which must fit into a recombination, hybridization, or similar network. Rather than first finding a set of trees which are optimal according to a phylogenetic criterion (e.g. likelihood or parsimony) and then attempting to fit them into a network, constNJ estimates the trees while enforcing specified rSPR distance constraints. The primary input for constNJ is a collection of distance matrices derived from sequence blocks which are assumed to have evolved in a tree-like manner, such as blocks of an alignment which do not contain any recombination breakpoints. The other input is a set of rSPR constraints for any set of pairs of trees. ConstNJ is consistent and a strict generalization of the neighbor-joining algorithm; it uses the new notion of "maximum agreement partitions" to assure that the resulting trees satisfy the given rSPR distance constraints.Comment: Please contact me with any questions or comments

    Parsimony via concensus

    Full text link
    The parsimony score of a character on a tree equals the number of state changes required to fit that character onto the tree. We show that for unordered, reversible characters this score equals the number of tree rearrangements required to fit the tree onto the character. We discuss implications of this connection for the debate over the use of consensus trees or total evidence, and show how it provides a link between incongruence of characters and recombination.Comment: Final published version of articl
    • …
    corecore