1,505 research outputs found
Reconstructing (super)trees from data sets with missing distances: Not all is lost
The wealth of phylogenetic information accumulated over many decades of biological research, coupled with recent technological advances in molecular sequence generation, present significant opportunities for researchers to investigate relationships across and within the kingdoms of life. However, to make best use of this data wealth, several problems must first be overcome. One key problem is finding effective strategies to deal with missing data. Here, we introduce Lasso, a novel heuristic approach for reconstructing rooted phylogenetic trees from distance matrices with missing values, for datasets where a molecular clock may be assumed. Contrary to other phylogenetic methods on partial datasets, Lasso possesses desirable properties such as its reconstructed trees being both unique and edge-weighted. These properties are achieved by Lasso restricting its leaf set to a large subset of all possible taxa, which in many practical situations is the entire taxa set. Furthermore, the Lasso approach is distance-based, rendering it very fast to run and suitable for datasets of all sizes, including large datasets such as those generated by modern Next Generation Sequencing technologies. To better understand the performance of Lasso, we assessed it by means of artificial and real biological datasets, showing its effectiveness in the presence of missing data. Furthermore, by formulating the supermatrix problem as a particular case of the missing data problem, we assessed Lasso's ability to reconstruct supertrees. We demonstrate that, although not specifically designed for such a purpose, Lasso performs better than or comparably with five leading supertree algorithms on a challenging biological data set. Finally, we make freely available a software implementation of Lasso so that researchers may, for the first time, perform both rooted tree and supertree reconstruction with branch lengths on their own partial datasets
On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa
Compatibility of phylogenetic trees is the most important concept underlying
widely-used methods for assessing the agreement of different phylogenetic trees
with overlapping taxa and combining them into common supertrees to reveal the
tree of life. The notion of ancestral compatibility of phylogenetic trees with
nested taxa was introduced by Semple et al in 2004. In this paper we analyze in
detail the meaning of this compatibility from the points of view of the local
structure of the trees, of the existence of embeddings into a common supertree,
and of the joint properties of their cluster representations. Our analysis
leads to a very simple polynomial-time algorithm for testing this
compatibility, which we have implemented and is freely available for download
from the BioPerl collection of Perl modules for computational biology.Comment: Submitte
A stable backbone for the fungi
Fungi are abundant in the biosphere. They have fascinated mankind as far as written history goes and have considerably influenced our culture. In biotechnology, cell biology, genetics, and life sciences in general fungi constitute relevant model organisms. Once the phylogenetic relationships of fungi are stably resolved individual results from fungal research can be combined into a holistic picture of biology. However, and despite recent progress, the backbone of the fungal phylogeny is not yet fully resolved. Especially the early evolutionary history of fungi and the order or below-order relationships within the ascomycetes remain uncertain. Here we present the first phylogenomic study for a eukaryotic kingdom that merges all publicly available fungal genomes and expressed sequence tags (EST) to build a data set comprising 128 genes and 146 taxa. The resulting tree provides a stable phylogenetic backbone for the fungi. Moreover, we present the first formal supertree based on 161 fungal taxa and 128 gene trees. The combined evidences from the trees support the deep-level stability of the fungal groups towards a comprehensive natural system of the fungi. They indicate that the classification of the fungi, especially their alliance with the Microsporidia, requires careful revision. Our analysis is also an inventory of present day sequence information for the fungi. It provides insights into which phylogenenetic conclusions can and which cannot be drawn from the current data and may serve as a guide to direct further sequencing initiatives. Together with a comprehensive animal phylogeny, we provide the second of three pillars to understand the evolution of the multicellular eukaryotic kingdoms, fungi, metazoa, and plants, in the past 1.6 billion years
Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees
In phylogenetics, a central problem is to infer the evolutionary
relationships between a set of species ; these relationships are often
depicted via a phylogenetic tree -- a tree having its leaves univocally labeled
by elements of and without degree-2 nodes -- called the "species tree". One
common approach for reconstructing a species tree consists in first
constructing several phylogenetic trees from primary data (e.g. DNA sequences
originating from some species in ), and then constructing a single
phylogenetic tree maximizing the "concordance" with the input trees. The
so-obtained tree is our estimation of the species tree and, when the input
trees are defined on overlapping -- but not identical -- sets of labels, is
called "supertree". In this paper, we focus on two problems that are central
when combining phylogenetic trees into a supertree: the compatibility and the
strict compatibility problems for unrooted phylogenetic trees. These problems
are strongly related, respectively, to the notions of "containing as a minor"
and "containing as a topological minor" in the graph community. Both problems
are known to be fixed-parameter tractable in the number of input trees , by
using their expressibility in Monadic Second Order Logic and a reduction to
graphs of bounded treewidth. Motivated by the fact that the dependency on
of these algorithms is prohibitively large, we give the first explicit dynamic
programming algorithms for solving these problems, both running in time
, where is the total size of the input.Comment: 18 pages, 1 figur
An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees
The relationship between two important problems in tree pattern matching, the
largest common subtree and the smallest common supertree problems, is
established by means of simple constructions, which allow one to obtain a
largest common subtree of two trees from a smallest common supertree of them,
and vice versa. These constructions are the same for isomorphic, homeomorphic,
topological, and minor embeddings, they take only time linear in the size of
the trees, and they turn out to have a clear algebraic meaning.Comment: 32 page
- …