1,505 research outputs found

    Reconstructing (super)trees from data sets with missing distances: Not all is lost

    Get PDF
    The wealth of phylogenetic information accumulated over many decades of biological research, coupled with recent technological advances in molecular sequence generation, present significant opportunities for researchers to investigate relationships across and within the kingdoms of life. However, to make best use of this data wealth, several problems must first be overcome. One key problem is finding effective strategies to deal with missing data. Here, we introduce Lasso, a novel heuristic approach for reconstructing rooted phylogenetic trees from distance matrices with missing values, for datasets where a molecular clock may be assumed. Contrary to other phylogenetic methods on partial datasets, Lasso possesses desirable properties such as its reconstructed trees being both unique and edge-weighted. These properties are achieved by Lasso restricting its leaf set to a large subset of all possible taxa, which in many practical situations is the entire taxa set. Furthermore, the Lasso approach is distance-based, rendering it very fast to run and suitable for datasets of all sizes, including large datasets such as those generated by modern Next Generation Sequencing technologies. To better understand the performance of Lasso, we assessed it by means of artificial and real biological datasets, showing its effectiveness in the presence of missing data. Furthermore, by formulating the supermatrix problem as a particular case of the missing data problem, we assessed Lasso's ability to reconstruct supertrees. We demonstrate that, although not specifically designed for such a purpose, Lasso performs better than or comparably with five leading supertree algorithms on a challenging biological data set. Finally, we make freely available a software implementation of Lasso so that researchers may, for the first time, perform both rooted tree and supertree reconstruction with branch lengths on their own partial datasets

    On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa

    Get PDF
    Compatibility of phylogenetic trees is the most important concept underlying widely-used methods for assessing the agreement of different phylogenetic trees with overlapping taxa and combining them into common supertrees to reveal the tree of life. The notion of ancestral compatibility of phylogenetic trees with nested taxa was introduced by Semple et al in 2004. In this paper we analyze in detail the meaning of this compatibility from the points of view of the local structure of the trees, of the existence of embeddings into a common supertree, and of the joint properties of their cluster representations. Our analysis leads to a very simple polynomial-time algorithm for testing this compatibility, which we have implemented and is freely available for download from the BioPerl collection of Perl modules for computational biology.Comment: Submitte

    A stable backbone for the fungi

    Get PDF
    Fungi are abundant in the biosphere. They have fascinated mankind as far as written history goes and have considerably influenced our culture. In biotechnology, cell biology, genetics, and life sciences in general fungi constitute relevant model organisms. Once the phylogenetic relationships of fungi are stably resolved individual results from fungal research can be combined into a holistic picture of biology. However, and despite recent progress, the backbone of the fungal phylogeny is not yet fully resolved. Especially the early evolutionary history of fungi and the order or below-order relationships within the ascomycetes remain uncertain. Here we present the first phylogenomic study for a eukaryotic kingdom that merges all publicly available fungal genomes and expressed sequence tags (EST) to build a data set comprising 128 genes and 146 taxa. The resulting tree provides a stable phylogenetic backbone for the fungi. Moreover, we present the first formal supertree based on 161 fungal taxa and 128 gene trees. The combined evidences from the trees support the deep-level stability of the fungal groups towards a comprehensive natural system of the fungi. They indicate that the classification of the fungi, especially their alliance with the Microsporidia, requires careful revision. Our analysis is also an inventory of present day sequence information for the fungi. It provides insights into which phylogenenetic conclusions can and which cannot be drawn from the current data and may serve as a guide to direct further sequencing initiatives. Together with a comprehensive animal phylogeny, we provide the second of three pillars to understand the evolution of the multicellular eukaryotic kingdoms, fungi, metazoa, and plants, in the past 1.6 billion years

    Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees

    Full text link
    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species XX; these relationships are often depicted via a phylogenetic tree -- a tree having its leaves univocally labeled by elements of XX and without degree-2 nodes -- called the "species tree". One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g. DNA sequences originating from some species in XX), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The so-obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping -- but not identical -- sets of labels, is called "supertree". In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed-parameter tractable in the number of input trees kk, by using their expressibility in Monadic Second Order Logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on kk of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time 2O(k2)n2^{O(k^2)} \cdot n, where nn is the total size of the input.Comment: 18 pages, 1 figur

    An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

    Get PDF
    The relationship between two important problems in tree pattern matching, the largest common subtree and the smallest common supertree problems, is established by means of simple constructions, which allow one to obtain a largest common subtree of two trees from a smallest common supertree of them, and vice versa. These constructions are the same for isomorphic, homeomorphic, topological, and minor embeddings, they take only time linear in the size of the trees, and they turn out to have a clear algebraic meaning.Comment: 32 page
    corecore