206 research outputs found
The space of ultrametric phylogenetic trees
The reliability of a phylogenetic inference method from genomic sequence data
is ensured by its statistical consistency. Bayesian inference methods produce a
sample of phylogenetic trees from the posterior distribution given sequence
data. Hence the question of statistical consistency of such methods is
equivalent to the consistency of the summary of the sample. More generally,
statistical consistency is ensured by the tree space used to analyse the
sample.
In this paper, we consider two standard parameterisations of phylogenetic
time-trees used in evolutionary models: inter-coalescent interval lengths and
absolute times of divergence events. For each of these parameterisations we
introduce a natural metric space on ultrametric phylogenetic trees. We compare
the introduced spaces with existing models of tree space and formulate several
formal requirements that a metric space on phylogenetic trees must possess in
order to be a satisfactory space for statistical analysis, and justify them. We
show that only a few known constructions of the space of phylogenetic trees
satisfy these requirements. However, our results suggest that these basic
requirements are not enough to distinguish between the two metric spaces we
introduce and that the choice between metric spaces requires additional
properties to be considered. Particularly, that the summary tree minimising the
square distance to the trees from the sample might be different for different
parameterisations. This suggests that further fundamental insight is needed
into the problem of statistical consistency of phylogenetic inference methods.Comment: Minor changes. This version has been published in JTB. 27 pages, 9
figure
BEAST: Bayesian evolutionary analysis by sampling trees
<p>Abstract</p> <p>Background</p> <p>The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented.</p> <p>Results</p> <p>BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at <url>http://beast-mcmc.googlecode.com/</url> under the GNU LGPL license.</p> <p>Conclusion</p> <p>BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.</p
Bayesian phylogenetic estimation of fossil ages
Recent advances have allowed for both morphological fossil evidence and
molecular sequences to be integrated into a single combined inference of
divergence dates under the rule of Bayesian probability. In particular the
fossilized birth-death tree prior and the Lewis-Mk model of discrete
morphological evolution allow for the estimation of both divergence times and
phylogenetic relationships between fossil and extant taxa. We exploit this
statistical framework to investigate the internal consistency of these models
by producing phylogenetic estimates of the age of each fossil in turn, within
two rich and well-characterized data sets of fossil and extant species
(penguins and canids). We find that the estimation accuracy of fossil ages is
generally high with credible intervals seldom excluding the true age and median
relative error in the two data sets of 5.7% and 13.2% respectively. The median
relative standard error (RSD) was 9.2% and 7.2% respectively, suggesting good
precision, although with some outliers. In fact in the two data sets we analyze
the phylogenetic estimates of fossil age is on average < 2 My from the midpoint
age of the geological strata from which it was excavated. The high level of
internal consistency found in our analyses suggests that the Bayesian
statistical model employed is an adequate fit for both the geological and
morphological data, and provides evidence from real data that the framework
used can accurately model the evolution of discrete morphological traits coded
from fossil and extant taxa. We anticipate that this approach will have diverse
applications beyond divergence time dating, including dating fossils that are
temporally unconstrained, testing of the "morphological clock", and for
uncovering potential model misspecification and/or data errors when
controversial phylogenetic hypotheses are obtained based on combined divergence
dating analyses.Comment: 28 pages, 8 figure
Calibrated Tree Priors for Relaxed Phylogenetics and Divergence Time Estimation
The use of fossil evidence to calibrate divergence time estimation has a long
history. More recently Bayesian MCMC has become the dominant method of
divergence time estimation and fossil evidence has been re-interpreted as the
specification of prior distributions on the divergence times of calibration
nodes. These so-called "soft calibrations" have become widely used but the
statistical properties of calibrated tree priors in a Bayesian setting has not
been carefully investigated. Here we clarify that calibration densities, such
as those defined in BEAST 1.5, do not represent the marginal prior distribution
of the calibration node. We illustrate this with a number of analytical results
on small trees. We also describe an alternative construction for a calibrated
Yule prior on trees that allows direct specification of the marginal prior
distribution of the calibrated divergence time, with or without the restriction
of monophyly. This method requires the computation of the Yule prior
conditional on the height of the divergence being calibrated. Unfortunately, a
practical solution for multiple calibrations remains elusive. Our results
suggest that direct estimation of the prior induced by specifying multiple
calibration densities should be a prerequisite of any divergence time dating
analysis
Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration
Phylogenetic analyses which include fossils or molecular sequences that are
sampled through time require models that allow one sample to be a direct
ancestor of another sample. As previously available phylogenetic inference
tools assume that all samples are tips, they do not allow for this possibility.
We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC)
algorithm to infer what we call sampled ancestor trees, that is, trees in which
sampled individuals can be direct ancestors of other sampled individuals. We
use a family of birth-death models where individuals may remain in the tree
process after the sampling, in particular we extend the birth-death skyline
model [Stadler et al, 2013] to sampled ancestor trees. This method allows the
detection of sampled ancestors as well as estimation of the probability that an
individual will be removed from the process when it is sampled. We show that
sampled ancestor birth-death models where all samples come from different time
points are non-identifiable and thus require one parameter to be known in order
to infer other parameters. We apply this method to epidemiological data, where
the possibility of sampled ancestors enables us to identify individuals that
infected other individuals after being sampled and to infer fundamental
epidemiological parameters. We also apply the method to infer divergence times
and diversification rates when fossils are included among the species samples,
so that fossilisation events are modelled as a part of the tree branching
process. Such modelling has many advantages as argued in literature. The
sampler is available as an open-source BEAST2 package
(https://github.com/gavryushkina/sampled-ancestors).Comment: 34 pages (including Supporting Information), 8 figures, 1 table. Part
of the work presented at Epidemics 2013 and The 18th Annual New Zealand
Phylogenomics Meeting, 201
Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model
The evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus
occurs so rapidly that the viruses' genomes contain information on past
ecological dynamics. Hence, we develop a phylodynamic method that enables the
joint estimation of epidemiological parameters and phylogenetic history. Based
on a compartmental susceptible-infected-removed (SIR) model, this method
provides separate information on incidence and prevalence of infections.
Detailed information on the interaction of host population dynamics and
evolutionary history can inform decisions on how to contain or entirely avoid
disease outbreaks.
We apply our Birth-Death SIR method (BDSIR) to two viral data sets. First,
five human immunodeficiency virus type 1 clusters sampled in the United Kingdom
between 1999 and 2003 are analyzed. The estimated basic reproduction ratios
range from 1.9 to 3.2 among the clusters. All clusters show a decline in the
growth rate of the local epidemic in the middle or end of the 90's.
The analysis of a hepatitis C virus (HCV) genotype 2c data set shows that the
local epidemic in the C\'ordoban city Cruz del Eje originated around 1906
(median), coinciding with an immigration wave from Europe to central Argentina
that dates from 1880--1920. The estimated time of epidemic peak is around 1970.Comment: Journal link:
http://rsif.royalsocietypublishing.org/content/11/94/20131106.ful
- …