4,238 research outputs found
Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf
Phylogenetic tree comparison metrics are an important tool in the study of
evolution, and hence the definition of such metrics is an interesting problem
in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed
to measure quantitatively the difference between a pair of phylogenetic trees
by first encoding them by means of their half-matrices of cophenetic values,
and then comparing these matrices. This idea has been used several times since
then to define dissimilarity measures between phylogenetic trees but, to our
knowledge, no proper metric on weighted phylogenetic trees with nested taxa
based on this idea has been formally defined and studied yet. Actually, the
cophenetic values of pairs of different taxa alone are not enough to single out
phylogenetic trees with weighted arcs or nested taxa. In this paper we define a
family of cophenetic metrics that compare phylogenetic trees on a same set of
taxa by encoding them by means of their vectors of cophenetic values of pairs
of taxa and depths of single taxa, and then computing the norm of the
difference of the corresponding vectors. Then, we study, either analytically or
numerically, some of their basic properties: neighbors, diameter, distribution,
and their rank correlation with each other and with other metrics.Comment: The "authors' cut" of a paper published in BMC Bioinformatics 14:3
(2013). 46 page
Dynamic and Multi-functional Labeling Schemes
We investigate labeling schemes supporting adjacency, ancestry, sibling, and
connectivity queries in forests. In the course of more than 20 years, the
existence of labeling schemes supporting each of these
functions was proven, with the most recent being ancestry [Fraigniaud and
Korman, STOC '10]. Several multi-functional labeling schemes also enjoy lower
or upper bounds of or
respectively. Notably an upper bound of for
adjacency+siblings and a lower bound of for each of the
functions siblings, ancestry, and connectivity [Alstrup et al., SODA '03]. We
improve the constants hidden in the -notation. In particular we show a lower bound for connectivity+ancestry and
connectivity+siblings, as well as an upper bound of for connectivity+adjacency+siblings by altering existing
methods.
In the context of dynamic labeling schemes it is known that ancestry requires
bits [Cohen, et al. PODS '02]. In contrast, we show upper and lower
bounds on the label size for adjacency, siblings, and connectivity of
bits, and to support all three functions. There exist efficient
adjacency labeling schemes for planar, bounded treewidth, bounded arboricity
and interval graphs. In a dynamic setting, we show a lower bound of
for each of those families.Comment: 17 pages, 5 figure
A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest
We give a 2-approximation algorithm for the Maximum Agreement Forest problem
on two rooted binary trees. This NP-hard problem has been studied extensively
in the past two decades, since it can be used to compute the Subtree
Prune-and-Regraft (SPR) distance between two phylogenetic trees. Our result
improves on the very recent 2.5-approximation algorithm due to Shi, Feng, You
and Wang (2015). Our algorithm is the first approximation algorithm for this
problem that uses LP duality in its analysis
- …