2,612 research outputs found
Computing the Skewness of the Phylogenetic Mean Pairwise Distance in Linear Time
The phylogenetic Mean Pairwise Distance (MPD) is one of the most popular
measures for computing the phylogenetic distance between a given group of
species. More specifically, for a phylogenetic tree T and for a set of species
R represented by a subset of the leaf nodes of T, the MPD of R is equal to the
average cost of all possible simple paths in T that connect pairs of nodes in
R.
Among other phylogenetic measures, the MPD is used as a tool for deciding if
the species of a given group R are closely related. To do this, it is important
to compute not only the value of the MPD for this group but also the
expectation, the variance, and the skewness of this metric. Although efficient
algorithms have been developed for computing the expectation and the variance
the MPD, there has been no approach so far for computing the skewness of this
measure.
In the present work we describe how to compute the skewness of the MPD on a
tree T optimally, in Theta(n) time; here n is the size of the tree T. So far
this is the first result that leads to an exact, let alone efficient,
computation of the skewness for any popular phylogenetic distance measure.
Moreover, we show how we can compute in Theta(n) time several interesting
quantities in T that can be possibly used as building blocks for computing
efficiently the skewness of other phylogenetic measures.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
A format for phylogenetic placements
We have developed a unified format for phylogenetic placements, that is,
mappings of environmental sequence data (e.g. short reads) into a phylogenetic
tree. We are motivated to do so by the growing number of tools for computing
and post-processing phylogenetic placements, and the lack of an established
standard for storing them. The format is lightweight, versatile, extensible,
and is based on the JSON format which can be parsed by most modern programming
languages. Our format is already implemented in several tools for computing and
post-processing parsimony- and likelihood-based phylogenetic placements, and
has worked well in practice. We believe that establishing a standard format for
analyzing read placements at this early stage will lead to a more efficient
development of powerful and portable post-analysis tools for the growing
applications of phylogenetic placement.Comment: Documents version 3 of the forma
- …