651 research outputs found
The generalized Robinson-Foulds metric
The Robinson-Foulds (RF) metric is arguably the most widely used measure of
phylogenetic tree similarity, despite its well-known shortcomings: For example,
moving a single taxon in a tree can result in a tree that has maximum distance
to the original one; but the two trees are identical if we remove the single
taxon. To this end, we propose a natural extension of the RF metric that does
not simply count identical clades but instead, also takes similar clades into
consideration. In contrast to previous approaches, our model requires the
matching between clades to respect the structure of the two trees, a property
that the classical RF metric exhibits, too. We show that computing this
generalized RF metric is, unfortunately, NP-hard. We then present a simple
Integer Linear Program for its computation, and evaluate it by an
all-against-all comparison of 100 trees from a benchmark data set. We find that
matchings that respect the tree structure differ significantly from those that
do not, underlining the importance of this natural condition.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance
We present a new method for inferring species trees from multi-copy gene
trees. Our method is based on a generalization of the Robinson-Foulds (RF)
distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple
leaves can have the same label. Unlike most previous phylogenetic methods using
gene trees, this method does not assume that gene tree incongruence is caused
by a single, specific biological process, such as gene duplication and loss,
deep coalescence, or lateral gene transfer. We prove that it is NP-hard to
compute the RF distance between two mul-trees, but it is easy to calculate the
generalized RF distance between a mul-tree and a singly-labeled tree. Motivated
by this observation, we formulate the RF supertree problem for mul-trees
(MulRF), which takes a collection of mul-trees and constructs a species tree
that minimizes the total RF distance from the input mul-trees. We present a
fast heuristic algorithm for the MulRF supertree problem. Simulation
experiments demonstrate that the MulRF method produces more accurate species
trees than gene tree parsimony methods when incongruence is caused by gene tree
error, duplications and losses, and/or lateral gene transfer. Furthermore, the
MulRF heuristic runs quickly on data sets containing hundreds of trees with up
to a hundred taxa.Comment: 16 pages, 11 figure
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
The hybridization number problem requires us to embed a set of binary rooted
phylogenetic trees into a binary rooted phylogenetic network such that the
number of nodes with indegree two is minimized. However, from a biological
point of view accurately inferring the root location in a phylogenetic tree is
notoriously difficult and poor root placement can artificially inflate the
hybridization number. To this end we study a number of relaxed variants of this
problem. We start by showing that the fundamental problem of determining
whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an
\emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show
that this problem is FPT in reticulation number. In the rooted case the
corresponding FPT result is trivial, but here we require more subtle
argumentation. Next we show that the hybridization number problem for unrooted
networks (when given two unrooted trees) is equivalent to the problem of
computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted
trees. In the third part of the paper we consider the "root uncertain" variant
of hybridization number. Here we are free to choose the root location in each
of a set of unrooted input trees such that the hybridization number of the
resulting rooted trees is minimized. On the negative side we show that this
problem is APX-hard. On the positive side, we show that the problem is FPT in
the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure
Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf
Phylogenetic tree comparison metrics are an important tool in the study of
evolution, and hence the definition of such metrics is an interesting problem
in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed
to measure quantitatively the difference between a pair of phylogenetic trees
by first encoding them by means of their half-matrices of cophenetic values,
and then comparing these matrices. This idea has been used several times since
then to define dissimilarity measures between phylogenetic trees but, to our
knowledge, no proper metric on weighted phylogenetic trees with nested taxa
based on this idea has been formally defined and studied yet. Actually, the
cophenetic values of pairs of different taxa alone are not enough to single out
phylogenetic trees with weighted arcs or nested taxa. In this paper we define a
family of cophenetic metrics that compare phylogenetic trees on a same set of
taxa by encoding them by means of their vectors of cophenetic values of pairs
of taxa and depths of single taxa, and then computing the norm of the
difference of the corresponding vectors. Then, we study, either analytically or
numerically, some of their basic properties: neighbors, diameter, distribution,
and their rank correlation with each other and with other metrics.Comment: The "authors' cut" of a paper published in BMC Bioinformatics 14:3
(2013). 46 page
- …