203 research outputs found

    On unrooted and root-uncertain variants of several well-known phylogenetic network problems

    Get PDF
    The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure

    A simple fixed parameter tractable algorithm for computing the hybridization number of two (not necessarily binary) trees

    Get PDF
    Here we present a new fixed parameter tractable algorithm to compute the hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set X in time (6^r.r!).poly(n)$, where n=|X|. The novelty of this approach is its use of terminals, which are maximal elements of a natural partial order on X, and several insights from the softwired clusters literature. This yields a surprisingly simple and practical bounded-search algorithm and offers an alternative perspective on the underlying combinatorial structure of the hybridization number problem

    Agreement forests of caterpillar trees: complexity, kernelization and branching

    Full text link
    Given a set XX of species, a phylogenetic tree is an unrooted binary tree whose leaves are bijectively labelled by XX. Such trees can be used to show the way species evolve over time. One way of understanding how topologically different two phylogenetic trees are, is to construct a minimum-size agreement forest: a partition of XX into the smallest number of blocks, such that the blocks induce homeomorphic, non-overlapping subtrees in both trees. This comparison yields insight into commonalities and differences in the evolution of XX across the two trees. Computing a smallest agreement forest is NP-hard (Hein, Jiang, Wang and Zhang, Discrete Applied Mathematics 71(1-3), 1996). In this work we study the problem on caterpillars, which are path-like phylogenetic trees. We will demonstrate that, even if we restrict the input to this highly restricted subclass, the problem remains NP-hard and is in fact APX-hard. Furthermore we show that for caterpillars two standard reductions rules well known in the literature yield a tight kernel of size at most 7k7k, compared to 15k15k for general trees (Kelk and Simone, SIAM Journal on Discrete Mathematics 33(3), 2019). Finally we demonstrate that we can determine if two caterpillars have an agreement forest with at most kk blocks in O∗(2.49k)O^*(2.49^k) time, compared to O∗(3k)O^*(3^k) for general trees (Chen, Fan and Sze, Theoretical Computater Science 562, 2015), where O∗(.)O^*(.) suppresses polynomial factors.Comment: 31 pages, 15 figure

    Pattern discovery in structural databases with applications to bioinformatics

    Get PDF
    Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this thesis, two new FSM techniques are proposed for finding patterns in unordered labeled trees. Such trees can be used to model evolutionary histories of different species, among others. The first FSM technique finds cousin pairs in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. This technique has been applied to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. The technique is also extended to undirected acyclic graphs (or free trees). The second FSM technique extends traditional MAST (maximum agreement subtree) algorithms by employing the Apriori data mining technique to find frequent agreement subtrees in multiple phylogenies. The correctness and completeness of the new mining algorithm are presented. The method is also extended to unrooted phylogenetic trees. Both FSM techniques studied in the thesis have been implemented into a toolkit, which is fully operational and accessible on the World Wide Web

    EvoMiner: Frequent Subtree Mining in Phylogenetic Databases

    Get PDF
    The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. As a result of these techniques, our algorithm achieves speed-ups of up to 100 times or more over phylominer, another algorithm for mining phylogenetic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed
    • …
    corecore