25,885 research outputs found

    Generating functions for multi-labeled trees

    Get PDF
    Multi-labeled trees are a generalization of phylogenetic trees that are used, for example, in the study of gene versus species evolution and as the basis for phylogenetic network construction. Unlike phylogenetic trees, in a leaf-multi-labeled tree it is possible to label more than one leaf by the same element of the underlying label set. In this paper we derive formulae for generating functions of leaf-multi-labeled trees and use these to derive recursions for counting such trees. In particular,weprove results which generalize previous theorems by Harding on so-called tree-shapes, and by Otter on relating the number of rooted and unrooted phylogenetic trees

    A parsimony-based metric for phylogenetic trees

    Get PDF
    In evolutionary biology various metrics have been defined and studied for comparing phylogenetic trees. Such metrics are used, for example, to compare competing evolutionary hypotheses or to help organize algorithms that search for optimal trees. Here we introduce a new metric dpdp on the collection of binary phylogenetic trees each labeled by the same set of species. The metric is based on the so-called parsimony score, an important concept in phylogenetics that is commonly used to construct phylogenetic trees. Our main results include a characterization of the unit neighborhood of a tree in the dpdp metric, and an explicit formula for its diameter, that is, a formula for the maximum possible value of dpdp over all possible pairs of trees labeled by the same set of species. We also show that dpdp is closely related to the well-known tree bisection and reconnection (tbr) and subtree prune and regraft (spr) distances, a connection which will hopefully provide a useful new approach to understanding properties of these and related metrics

    Managing and analyzing phylogenetic databases

    Get PDF
    The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases. Our first contribution is the further development of STBase, a phylogenetic tree database consisting of a billion trees whose leaf sets range from four to 20000. STBase applies techniques from different areas of computer science for efficient tree storage and retrieval. It also introduces new ideas that are specific to tree databases. STBase provides a unique opportunity to explore innovative ways to analyze the results from queries on large sets of phylogenetic trees. We propose new ways of extracting consensus information from a collection of phylogenetic trees. Specifically, this involves extending the maximum agreement subtree problem. We greatly improve upon an existing approach based on frequent subtrees and, propose two new approaches based on agreement subtrees and frequent subtrees respectively. The final part of our proposed work deals with the problem of simplifying multi-labeled trees and handling rogue taxa. We propose a novel technique to extract conflict-free information from multi-labeled trees as a much smaller single labeled tree. We show that the inherent problem in identifying rogue taxa is NP-hard and give fixed-parameter tractable and integer linear programming solutions

    Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees

    Full text link
    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species XX; these relationships are often depicted via a phylogenetic tree -- a tree having its leaves univocally labeled by elements of XX and without degree-2 nodes -- called the "species tree". One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g. DNA sequences originating from some species in XX), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The so-obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping -- but not identical -- sets of labels, is called "supertree". In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed-parameter tractable in the number of input trees kk, by using their expressibility in Monadic Second Order Logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on kk of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time 2O(k2)â‹…n2^{O(k^2)} \cdot n, where nn is the total size of the input.Comment: 18 pages, 1 figur
    • …
    corecore