719 research outputs found
The space of ultrametric phylogenetic trees
The reliability of a phylogenetic inference method from genomic sequence data
is ensured by its statistical consistency. Bayesian inference methods produce a
sample of phylogenetic trees from the posterior distribution given sequence
data. Hence the question of statistical consistency of such methods is
equivalent to the consistency of the summary of the sample. More generally,
statistical consistency is ensured by the tree space used to analyse the
sample.
In this paper, we consider two standard parameterisations of phylogenetic
time-trees used in evolutionary models: inter-coalescent interval lengths and
absolute times of divergence events. For each of these parameterisations we
introduce a natural metric space on ultrametric phylogenetic trees. We compare
the introduced spaces with existing models of tree space and formulate several
formal requirements that a metric space on phylogenetic trees must possess in
order to be a satisfactory space for statistical analysis, and justify them. We
show that only a few known constructions of the space of phylogenetic trees
satisfy these requirements. However, our results suggest that these basic
requirements are not enough to distinguish between the two metric spaces we
introduce and that the choice between metric spaces requires additional
properties to be considered. Particularly, that the summary tree minimising the
square distance to the trees from the sample might be different for different
parameterisations. This suggests that further fundamental insight is needed
into the problem of statistical consistency of phylogenetic inference methods.Comment: Minor changes. This version has been published in JTB. 27 pages, 9
figure
Ultrametric embedding: application to data fingerprinting and to fast data clustering
We begin with pervasive ultrametricity due to high dimensionality and/or
spatial sparsity. How extent or degree of ultrametricity can be quantified
leads us to the discussion of varied practical cases when ultrametricity can be
partially or locally present in data. We show how the ultrametricity can be
assessed in text or document collections, and in time series signals. An aspect
of importance here is that to draw benefit from this perspective the data may
need to be recoded. Such data recoding can also be powerful in proximity
searching, as we will show, where the data is embedded globally and not locally
in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19
May 2006 versio
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
Phylogenetic trees are the fundamental mathematical representation of
evolutionary processes in biology. As data objects, they are characterized by
the challenges associated with "big data," as well as the complication that
their discrete geometric structure results in a non-Euclidean phylogenetic tree
space, which poses computational and statistical limitations. We propose and
study a novel framework to study sets of phylogenetic trees based on tropical
geometry. In particular, we focus on characterizing our framework for
statistical analyses of evolutionary biological processes represented by
phylogenetic trees. Our setting exhibits analytic, geometric, and topological
properties that are desirable for theoretical studies in probability and
statistics, as well as increased computational efficiency over the current
state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl
Replica symmetry breaking related to a general ultrametric space III: the case of general measure
Family of replica matrices, related to general ultrametric spaces with
general measures, is introduced. These matrices generalize the known Parisi
matrices. Some functionals of replica approach are computed. Replica symmetry
breaking solution is found.Comment: 21 page
- …