96 research outputs found

    The agreement distance of unrooted phylogenetic networks

    Full text link
    A rearrangement operation makes a small graph-theoretical change to a phylogenetic network to transform it into another one. For unrooted phylogenetic trees and networks, popular rearrangement operations are tree bisection and reconnection (TBR) and prune and regraft (PR) (called subtree prune and regraft (SPR) on trees). Each of these operations induces a metric on the sets of phylogenetic trees and networks. The TBR-distance between two unrooted phylogenetic trees TT and TT' can be characterised by a maximum agreement forest, that is, a forest with a minimum number of components that covers both TT and TT' in a certain way. This characterisation has facilitated the development of fixed-parameter tractable algorithms and approximation algorithms. Here, we introduce maximum agreement graphs as a generalisations of maximum agreement forests for phylogenetic networks. While the agreement distance -- the metric induced by maximum agreement graphs -- does not characterise the TBR-distance of two networks, we show that it still provides constant-factor bounds on the TBR-distance. We find similar results for PR in terms of maximum endpoint agreement graphs.Comment: 23 pages, 13 figures, final journal versio

    SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (<smcaps>R</smcaps>SPR) operation and the number of <smcaps>R</smcaps>SPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of <smcaps>R</smcaps>SPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of <smcaps>R</smcaps>SPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees.</p> <p>Results</p> <p>We present the SPR Identification Tool (SPRIT), a novel algorithm that solves the fixed parameter tractable minimum <smcaps>R</smcaps>SPR problem and its GPL licensed Java implementation. The algorithm can be used in two ways, exhaustive search that guarantees the minimum <smcaps>R</smcaps>SPR distance and a heuristic approach that guarantees finding a solution, but not necessarily the minimum one. We benchmarked SPRIT against other software in two different settings, small to medium sized trees i.e. five to one hundred taxa and large trees i.e. thousands of taxa. In the small to medium tree size setting with random artificial incongruence, SPRIT's heuristic mode outperforms the other software by always delivering a solution with a low overestimation of the <smcaps>R</smcaps>SPR distance. In the large tree setting SPRIT compares well to the alternatives when benchmarked on finding a minimum solution within a reasonable time. SPRIT presents both the minimum <smcaps>R</smcaps>SPR distance and the intermediate trees.</p> <p>Conclusions</p> <p>When used in exhaustive search mode, SPRIT identifies the minimum number of <smcaps>R</smcaps>SPRs needed to reconcile two incongruent rooted trees. SPRIT also performs quick approximations of the minimum <smcaps>R</smcaps>SPR distance, which are comparable to, and often better than, purely heuristic solutions. Put together, SPRIT is an excellent tool for identification of HGT events and pinpointing which taxa have been involved in HGT.</p

    Optimal Completion and Comparison of Incomplete Phylogenetic Trees Under Robinson-Foulds Distance

    Get PDF

    MUL-Tree Pruning for Consistency and Compatibility

    Get PDF
    A multi-labelled tree (or MUL-tree) is a rooted tree leaf-labelled by a set of labels, where each label may appear more than once in the tree. We consider the MUL-tree Set Pruning for Consistency problem (MULSETPC), which takes as input a set of MUL-trees and asks whether there exists a perfect pruning of each MUL-tree that results in a consistent set of single-labelled trees. MULSETPC was proven to be NP-complete by Gascon et al. when the MUL-trees are binary, each leaf label is used at most three times, and the number of MUL-trees is unbounded. To determine the computational complexity of the problem when the number of MUL-trees is constant was left as an open problem. Here, we resolve this question by proving a much stronger result, namely that MULSETPC is NP-complete even when there are only two MUL-trees, every leaf label is used at most twice, and every MUL-tree is either binary or has constant height. Furthermore, we introduce an extension of MULSETPC that we call MULSETPComp, which replaces the notion of consistency with compatibility, and prove that MULSETPComp is NP-complete even when there are only two MUL-trees, every leaf label is used at most thrice, and every MUL-tree has constant height. Finally, we present a polynomial-time algorithm for instances of MULSETPC with a constant number of binary MUL-trees, in the special case where every leaf label occurs exactly once in at least one MUL-tree
    corecore