18 research outputs found
Polyhedral computational geometry for averaging metric phylogenetic trees
This paper investigates the computational geometry relevant to calculations
of the Frechet mean and variance for probability distributions on the
phylogenetic tree space of Billera, Holmes and Vogtmann, using the theory of
probability measures on spaces of nonpositive curvature developed by Sturm. We
show that the combinatorics of geodesics with a specified fixed endpoint in
tree space are determined by the location of the varying endpoint in a certain
polyhedral subdivision of tree space. The variance function associated to a
finite subset of tree space has a fixed algebraic formula within
each cell of the corresponding subdivision, and is continuously differentiable
in the interior of each orthant of tree space. We use this subdivision to
establish two iterative methods for producing sequences that converge to the
Frechet mean: one based on Sturm's Law of Large Numbers, and another based on
descent algorithms for finding optima of smooth functions on convex polyhedra.
We present properties and biological applications of Frechet means and extend
our main results to more general globally nonpositively curved spaces composed
of Euclidean orthants.Comment: 43 pages, 6 figures; v2: fixed typos, shortened Sections 1 and 5,
added counter example for polyhedrality of vistal subdivision in general
CAT(0) cubical complexes; v1: 43 pages, 5 figure
Uncertainty in phylogenetic tree estimates
Estimating phylogenetic trees is an important problem in evolutionary
biology, environmental policy and medicine. Although trees are estimated, their
uncertainties are discarded by mathematicians working in tree space. Here we
explicitly model the multivariate uncertainty of tree estimates. We consider
both the cases where uncertainty information arises extrinsically (through
covariate information) and intrinsically (through the tree estimates
themselves). The importance of accounting for tree uncertainty in tree space is
demonstrated in two case studies. In the first instance, differences between
gene trees are small relative to their uncertainties, while in the second, the
differences are relatively large. Our main goal is visualization of tree
uncertainty, and we demonstrate advantages of our method with respect to
reproducibility, speed and preservation of topological differences compared to
visualization based on multidimensional scaling. The proposal highlights that
phylogenetic trees are estimated in an extremely high-dimensional space,
resulting in uncertainty information that cannot be discarded. Most
importantly, it is a method that allows biologists to diagnose whether
differences between gene trees are biologically meaningful, or due to
uncertainty in estimation.Comment: Final version accepted to Journal of Computational and Graphical
Statistic