4,391 research outputs found
Accelerating Bayesian inference for evolutionary biology models.
Bayesian inference is widely used nowadays and relies largely on Markov chain Monte Carlo (MCMC) methods. Evolutionary biology has greatly benefited from the developments of MCMC methods, but the design of more complex and realistic models and the ever growing availability of novel data is pushing the limits of the current use of these methods.
We present a parallel Metropolis-Hastings (M-H) framework built with a novel combination of enhancements aimed towards parameter-rich and complex models. We show on a parameter-rich macroevolutionary model increases of the sampling speed up to 35 times with 32 processors when compared to a sequential M-H process. More importantly, our framework achieves up to a twentyfold faster convergence to estimate the posterior probability of phylogenetic trees using 32 processors when compared to the well-known software MrBayes for Bayesian inference of phylogenetic trees.
https://bitbucket.org/XavMeyer/hogan.
[email protected].
Supplementary data are available at Bioinformatics online
Biological invasions in agricultural settings: insights from evolutionary biology and population genetics
Invasion biology and agriculture are intimately related for several reasons and in particular because many agricultural pest species are recent invaders. In this article we suggest that the reconstruction of invasion routes with population genetics-based methods can address fundamental questions in ecology and practical aspects of the management of biological invasions in agricultural settings. We provide a brief description of the methods used to reconstruct invasion routes and describe their main characteristics. In particular, we focus on a scenario - the bridgehead invasion scenario -, which had been overlooked until recently. We show that this scenario, in which an invasive population is the source of other invasive populations, is evolutionarily parsimonious and may have played a crucial role in shaping the distribution of many recent agricultural pests
A Kolmogorov-Smirnov test for the molecular clock on Bayesian ensembles of phylogenies
Divergence date estimates are central to understand evolutionary processes
and depend, in the case of molecular phylogenies, on tests of molecular clocks.
Here we propose two non-parametric tests of strict and relaxed molecular clocks
built upon a framework that uses the empirical cumulative distribution (ECD) of
branch lengths obtained from an ensemble of Bayesian trees and well known
non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS)
goodness-of-fit test. In the strict clock case, the method consists in using
the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny
is clock-like, in other words, if it follows a Poisson law. The ECD is computed
from the discretized branch lengths and the parameter of the expected
Poisson distribution is calculated as the average branch length over the
ensemble of trees. To compensate for the auto-correlation in the ensemble of
trees and pseudo-replication we take advantage of thinning and effective sample
size, two features provided by Bayesian inference MCMC samplers. Finally, it is
observed that tree topologies with very long or very short branches lead to
Poisson mixtures and in this case we propose the use of the two-sample KS test
with samples from two continuous branch length distributions, one obtained from
an ensemble of clock-constrained trees and the other from an ensemble of
unconstrained trees. Moreover, in this second form the test can also be applied
to test for relaxed clock models. The use of a statistically equivalent
ensemble of phylogenies to obtain the branch lengths ECD, instead of one
consensus tree, yields considerable reduction of the effects of small sample
size and provides again of power.Comment: 14 pages, 9 figures, 8 tables. Minor revision, additin of a new
example and new title. Software:
https://github.com/FernandoMarcon/PKS_Test.gi
New Insights into History Matching via Sequential Monte Carlo
The aim of the history matching method is to locate non-implausible regions
of the parameter space of complex deterministic or stochastic models by
matching model outputs with data. It does this via a series of waves where at
each wave an emulator is fitted to a small number of training samples. An
implausibility measure is defined which takes into account the closeness of
simulated and observed outputs as well as emulator uncertainty. As the waves
progress, the emulator becomes more accurate so that training samples are more
concentrated on promising regions of the space and poorer parts of the space
are rejected with more confidence. Whilst history matching has proved to be
useful, existing implementations are not fully automated and some ad-hoc
choices are made during the process, which involves user intervention and is
time consuming. This occurs especially when the non-implausible region becomes
small and it is difficult to sample this space uniformly to generate new
training points. In this article we develop a sequential Monte Carlo (SMC)
algorithm for implementation which is semi-automated. Our novel SMC approach
reveals that the history matching method yields a non-implausible distribution
that can be multi-modal, highly irregular and very difficult to sample
uniformly. Our SMC approach offers a much more reliable sampling of the
non-implausible space, which requires additional computation compared to other
approaches used in the literature
Inferring Kangaroo Phylogeny from Incongruent Nuclear and Mitochondrial Genes
The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression.This work has been supported by Australian Research Council grants to MJP (DP07745015) and MB (FT0991741). The website for the funder is www.arc.gov.au. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Inferring kangaroo phylogeny from incongruent nuclear and mitochondrial genes
The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression
Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements
Continuous-time birth-death-shift (BDS) processes are frequently used in
stochastic modeling, with many applications in ecology and epidemiology. In
particular, such processes can model evolutionary dynamics of transposable
elements - important genetic markers in molecular epidemiology. Estimation of
the effects of individual covariates on the birth, death, and shift rates of
the process can be accomplished by analyzing patient data, but inferring these
rates in a discretely and unevenly observed setting presents computational
challenges. We propose a mutli-type branching process approximation to BDS
processes and develop a corresponding expectation maximization (EM) algorithm,
where we use spectral techniques to reduce calculation of expected sufficient
statistics to low dimensional integration. These techniques yield an efficient
and robust optimization routine for inferring the rates of the BDS process, and
apply more broadly to multi-type branching processes where rates can depend on
many covariates. After rigorously testing our methodology in simulation
studies, we apply our method to study intrapatient time evolution of IS6110
transposable element, a frequently used element during estimation of
epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl
Accelerating delayed-acceptance Markov chain Monte Carlo algorithms
Delayed-acceptance Markov chain Monte Carlo (DA-MCMC) samples from a
probability distribution via a two-stages version of the Metropolis-Hastings
algorithm, by combining the target distribution with a "surrogate" (i.e. an
approximate and computationally cheaper version) of said distribution. DA-MCMC
accelerates MCMC sampling in complex applications, while still targeting the
exact distribution. We design a computationally faster, albeit approximate,
DA-MCMC algorithm. We consider parameter inference in a Bayesian setting where
a surrogate likelihood function is introduced in the delayed-acceptance scheme.
When the evaluation of the likelihood function is computationally intensive,
our scheme produces a 2-4 times speed-up, compared to standard DA-MCMC.
However, the acceleration is highly problem dependent. Inference results for
the standard delayed-acceptance algorithm and our approximated version are
similar, indicating that our algorithm can return reliable Bayesian inference.
As a computationally intensive case study, we introduce a novel stochastic
differential equation model for protein folding data.Comment: 40 pages, 21 figures, 10 table
Nonadaptive Amino Acid Convergence Rates Decrease over Time.
Convergence is a central concept in evolutionary studies because it provides strong evidence for adaptation. It also provides information about the nature of the fitness landscape and the repeatability of evolution, and can mislead phylogenetic inference. To understand the role of adaptive convergence, we need to understand the patterns of nonadaptive convergence. Here, we consider the relationship between nonadaptive convergence and divergence in mitochondrial and model proteins. Surprisingly, nonadaptive convergence is much more common than expected in closely related organisms, falling off as organisms diverge. The extent of the convergent drop-off in mitochondrial proteins is well predicted by epistatic or coevolutionary effects in our "evolutionary Stokes shift" models and poorly predicted by conventional evolutionary models. Convergence probabilities decrease dramatically if the ancestral amino acids of branches being compared have diverged, but also drop slowly over evolutionary time even if the ancestral amino acids have not substituted. Convergence probabilities drop-off rapidly for quickly evolving sites, but much more slowly for slowly evolving sites. Furthermore, once sites have diverged their convergence probabilities are extremely low and indistinguishable from convergence levels at randomized sites. These results indicate that we cannot assume that excessive convergence early on is necessarily adaptive. This new understanding should help us to better discriminate adaptive from nonadaptive convergence and develop more relevant evolutionary models with improved validity for phylogenetic inference
- …