344 research outputs found
Principal components analysis in the space of phylogenetic trees
Phylogenetic analysis of DNA or other data commonly gives rise to a
collection or sample of inferred evolutionary trees. Principal Components
Analysis (PCA) cannot be applied directly to collections of trees since the
space of evolutionary trees on a fixed set of taxa is not a vector space. This
paper describes a novel geometrical approach to PCA in tree-space that
constructs the first principal path in an analogous way to standard linear
Euclidean PCA. Given a data set of phylogenetic trees, a geodesic principal
path is sought that maximizes the variance of the data under a form of
projection onto the path. Due to the high dimensionality of tree-space and the
nonlinear nature of this problem, the computational complexity is potentially
very high, so approximate optimization algorithms are used to search for the
optimal path. Principal paths identified in this way reveal and quantify the
main sources of variation in the original collection of trees in terms of both
topology and branch lengths. The approach is illustrated by application to
simulated sets of trees and to a set of gene trees from metazoan (animal)
species.Comment: Published in at http://dx.doi.org/10.1214/11-AOS915 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The space of ultrametric phylogenetic trees
The reliability of a phylogenetic inference method from genomic sequence data
is ensured by its statistical consistency. Bayesian inference methods produce a
sample of phylogenetic trees from the posterior distribution given sequence
data. Hence the question of statistical consistency of such methods is
equivalent to the consistency of the summary of the sample. More generally,
statistical consistency is ensured by the tree space used to analyse the
sample.
In this paper, we consider two standard parameterisations of phylogenetic
time-trees used in evolutionary models: inter-coalescent interval lengths and
absolute times of divergence events. For each of these parameterisations we
introduce a natural metric space on ultrametric phylogenetic trees. We compare
the introduced spaces with existing models of tree space and formulate several
formal requirements that a metric space on phylogenetic trees must possess in
order to be a satisfactory space for statistical analysis, and justify them. We
show that only a few known constructions of the space of phylogenetic trees
satisfy these requirements. However, our results suggest that these basic
requirements are not enough to distinguish between the two metric spaces we
introduce and that the choice between metric spaces requires additional
properties to be considered. Particularly, that the summary tree minimising the
square distance to the trees from the sample might be different for different
parameterisations. This suggests that further fundamental insight is needed
into the problem of statistical consistency of phylogenetic inference methods.Comment: Minor changes. This version has been published in JTB. 27 pages, 9
figure
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
Phylogenetic trees are the fundamental mathematical representation of
evolutionary processes in biology. As data objects, they are characterized by
the challenges associated with "big data," as well as the complication that
their discrete geometric structure results in a non-Euclidean phylogenetic tree
space, which poses computational and statistical limitations. We propose and
study a novel framework to study sets of phylogenetic trees based on tropical
geometry. In particular, we focus on characterizing our framework for
statistical analyses of evolutionary biological processes represented by
phylogenetic trees. Our setting exhibits analytic, geometric, and topological
properties that are desirable for theoretical studies in probability and
statistics, as well as increased computational efficiency over the current
state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl
- …