25,885 research outputs found
Generating functions for multi-labeled trees
Multi-labeled trees are a generalization of phylogenetic trees that are used, for example, in the study of gene versus species evolution and as the basis for phylogenetic network construction. Unlike phylogenetic trees, in a leaf-multi-labeled tree it is possible to label more than one leaf by the same element of the underlying label set. In this paper we derive formulae for generating functions of leaf-multi-labeled trees and use these to derive recursions for counting such trees. In particular,weprove results which generalize previous theorems by Harding on so-called tree-shapes, and by Otter on relating the number of rooted and unrooted phylogenetic trees
A parsimony-based metric for phylogenetic trees
In evolutionary biology various metrics have been defined and studied for comparing phylogenetic trees. Such metrics are used, for example, to compare competing evolutionary hypotheses or to help organize algorithms that search for optimal trees. Here we introduce a new metric dpdp on the collection of binary phylogenetic trees each labeled by the same set of species. The metric is based on the so-called parsimony score, an important concept in phylogenetics that is commonly used to construct phylogenetic trees. Our main results include a characterization of the unit neighborhood of a tree in the dpdp metric, and an explicit formula for its diameter, that is, a formula for the maximum possible value of dpdp over all possible pairs of trees labeled by the same set of species. We also show that dpdp is closely related to the well-known tree bisection and reconnection (tbr) and subtree prune and regraft (spr) distances, a connection which will hopefully provide a useful new approach to understanding properties of these and related metrics
Managing and analyzing phylogenetic databases
The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases.
Our first contribution is the further development of STBase, a phylogenetic tree database consisting of a billion trees whose leaf sets range from four to 20000. STBase applies techniques from different areas of computer science for efficient tree storage and retrieval. It also introduces new ideas that are specific to tree databases.
STBase provides a unique opportunity to explore innovative ways to analyze the results from queries on large sets of phylogenetic trees. We propose new ways of extracting consensus information from a collection of phylogenetic trees. Specifically, this involves extending the maximum agreement subtree problem. We greatly improve upon an existing approach based on frequent subtrees and, propose two new approaches based on agreement subtrees and frequent subtrees respectively.
The final part of our proposed work deals with the problem of simplifying multi-labeled trees and handling rogue taxa. We propose a novel technique to extract conflict-free information from multi-labeled trees as a much smaller single labeled tree. We show that the inherent problem in identifying rogue taxa is NP-hard and give fixed-parameter tractable and integer linear programming solutions
Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees
In phylogenetics, a central problem is to infer the evolutionary
relationships between a set of species ; these relationships are often
depicted via a phylogenetic tree -- a tree having its leaves univocally labeled
by elements of and without degree-2 nodes -- called the "species tree". One
common approach for reconstructing a species tree consists in first
constructing several phylogenetic trees from primary data (e.g. DNA sequences
originating from some species in ), and then constructing a single
phylogenetic tree maximizing the "concordance" with the input trees. The
so-obtained tree is our estimation of the species tree and, when the input
trees are defined on overlapping -- but not identical -- sets of labels, is
called "supertree". In this paper, we focus on two problems that are central
when combining phylogenetic trees into a supertree: the compatibility and the
strict compatibility problems for unrooted phylogenetic trees. These problems
are strongly related, respectively, to the notions of "containing as a minor"
and "containing as a topological minor" in the graph community. Both problems
are known to be fixed-parameter tractable in the number of input trees , by
using their expressibility in Monadic Second Order Logic and a reduction to
graphs of bounded treewidth. Motivated by the fact that the dependency on
of these algorithms is prohibitively large, we give the first explicit dynamic
programming algorithms for solving these problems, both running in time
, where is the total size of the input.Comment: 18 pages, 1 figur
- …