2,842 research outputs found
The Bourque Distances for Mutation Trees of Cancers
Mutation trees are rooted trees of arbitrary node degree in which each node is labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson - Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation trees inferred with different methods or for different patients often contain different sets of mutation labels. Here, we generalize the Robinson - Foulds distance into a set of distance metrics called Bourque distances for comparing mutation trees. A connection between the Robinson - Foulds distance and the nearest neighbor interchange distance is also presented
Computing the Distribution of a Tree Metric
The Robinson-Foulds (RF) distance is by far the most widely used measure of
dissimilarity between trees. Although the distribution of these distances has
been investigated for twenty years, an algorithm that is explicitly polynomial
time has yet to be described for computing this distribution (which is also the
distribution of trees around a given tree under the popular Robinson-Foulds
metric). In this paper we derive a polynomial-time algorithm for this
distribution. We show how the distribution can be approximated by a Poisson
distribution determined by the proportion of leaves that lie in `cherries' of
the given tree. We also describe how our results can be used to derive
normalization constants that are required in a recently-proposed maximum
likelihood approach to supertree construction.Comment: 16 pages, 3 figure
Computing the distribution of a tree metric
The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the dis- tribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in ‘cherries’ of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction
A generalized Robinson-Foulds distance for labeled trees.
The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc).
We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees.
We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf
A generalized Robinson-Foulds distance for labeled trees
Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite
a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time.
For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the
type of the branching event (e.g. speciation, duplication, transfer, etc).
Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge
contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In
particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges,
i.e. edges shared between the two trees.
Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead,
computing distances between labeled trees opens up a variety of new algorithmic directions.
Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf
The generalized Robinson-Foulds distance for phylogenetic trees
The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.This research was partially supported by the Spanish Ministry of Science, Innovation and Universitiesand the European Regional Development Fund through project PGC2018-096956-B-C43 (FEDER/MICINN/AEI), and by the Agency for Management of University and Research Grants (AGAUR) throughgrant 2017-SGR-786 (ALBCOM).Peer ReviewedPostprint (published version
The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics
A multilabeled tree (or MUL-tree) is a rooted tree in which every leaf is labelled by an element from some set, but in which more than one leaf may be labelled by the same element of that set. In phylogenetics, such trees are used in biogeographical studies, to study the evolution of gene families, and also within approaches to construct phylogenetic networks. A multilabelled tree in which no leaf-labels are repeated is called a phylogenetic tree, and one in which every label is the same is also known as a tree-shape. In this paper, we consider the complexity of computing metrics on MUL-trees that are obtained by extending metrics on phylogenetic trees. In particular, by restricting our attention to tree shapes, we show that computing the metric extension on MUL-trees is NP-complete for two well-known metrics on phylogenetic trees, namely, the path-difference and Robinson Foulds distances. We also show that the extension of the Robinson Foulds distance is fixed parameter tractable with respect to the distance parameter. The path distance complexity result allows us to also answer an open problem concerning the complexity of solving the quadratic assignment problem for two matrices that are a Robinson similarity and a Robinson dissimilarity, which we show to be NP-complete. We conclude by considering the maximum agreement subtree (MAST) distance on phylogenetic trees to MUL-trees. Although its extension to MUL-trees can be computed in polynomial time, we show that computing its natural generalization to more than two MUL-trees is NP-complete, although fixed-parameter tractable in the maximum degree when the number of given trees is bounded
Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance
We present a new method for inferring species trees from multi-copy gene
trees. Our method is based on a generalization of the Robinson-Foulds (RF)
distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple
leaves can have the same label. Unlike most previous phylogenetic methods using
gene trees, this method does not assume that gene tree incongruence is caused
by a single, specific biological process, such as gene duplication and loss,
deep coalescence, or lateral gene transfer. We prove that it is NP-hard to
compute the RF distance between two mul-trees, but it is easy to calculate the
generalized RF distance between a mul-tree and a singly-labeled tree. Motivated
by this observation, we formulate the RF supertree problem for mul-trees
(MulRF), which takes a collection of mul-trees and constructs a species tree
that minimizes the total RF distance from the input mul-trees. We present a
fast heuristic algorithm for the MulRF supertree problem. Simulation
experiments demonstrate that the MulRF method produces more accurate species
trees than gene tree parsimony methods when incongruence is caused by gene tree
error, duplications and losses, and/or lateral gene transfer. Furthermore, the
MulRF heuristic runs quickly on data sets containing hundreds of trees with up
to a hundred taxa.Comment: 16 pages, 11 figure
- …