63 research outputs found
Bayesian inference of population size history from multiple loci
<p>Abstract</p> <p>Background</p> <p>Effective population size (<it>N</it><sub><it>e</it></sub>) is related to genetic variability and is a basic parameter in many models of population genetics. A number of methods for inferring current and past population sizes from genetic data have been developed since JFC Kingman introduced the n-coalescent in 1982. Here we present the Extended Bayesian Skyline Plot, a non-parametric Bayesian Markov chain Monte Carlo algorithm that extends a previous coalescent-based method in several ways, including the ability to analyze multiple loci.</p> <p>Results</p> <p>Through extensive simulations we show the accuracy and limitations of inferring population size as a function of the amount of data, including recovering information about evolutionary bottlenecks. We also analyzed two real data sets to demonstrate the behavior of the new method; a single gene Hepatitis C virus data set sampled from Egypt and a 10 locus <it>Drosophila ananassae </it>data set representing 16 different populations.</p> <p>Conclusion</p> <p>The results demonstrate the essential role of multiple loci in recovering population size dynamics. Multi-locus data from a small number of individuals can precisely recover past bottlenecks in population size which can not be characterized by analysis of a single locus. We also demonstrate that sequence data quality is important because even moderate levels of sequencing errors result in a considerable decrease in estimation accuracy for realistic levels of population genetic variability.</p
Calibrated Tree Priors for Relaxed Phylogenetics and Divergence Time Estimation
The use of fossil evidence to calibrate divergence time estimation has a long
history. More recently Bayesian MCMC has become the dominant method of
divergence time estimation and fossil evidence has been re-interpreted as the
specification of prior distributions on the divergence times of calibration
nodes. These so-called "soft calibrations" have become widely used but the
statistical properties of calibrated tree priors in a Bayesian setting has not
been carefully investigated. Here we clarify that calibration densities, such
as those defined in BEAST 1.5, do not represent the marginal prior distribution
of the calibration node. We illustrate this with a number of analytical results
on small trees. We also describe an alternative construction for a calibrated
Yule prior on trees that allows direct specification of the marginal prior
distribution of the calibrated divergence time, with or without the restriction
of monophyly. This method requires the computation of the Yule prior
conditional on the height of the divergence being calibrated. Unfortunately, a
practical solution for multiple calibrations remains elusive. Our results
suggest that direct estimation of the prior induced by specifying multiple
calibration densities should be a prerequisite of any divergence time dating
analysis
CALIBRATING DIVERGENCE TIMES ON SPECIES TREES VERSUS GENE TREES: IMPLICATIONS FOR SPECIATION HISTORY OF APHELOCOMA JAYS
Estimates of the timing of divergence are central to testing the underlying causes of speciation. Relaxed molecular clocks and fossil calibration have improved these estimates; however, these advances are implemented in the context of gene trees, which can overestimate divergence times. Here we couple recent innovations for dating speciation events with the analytical power of species trees, where multilocus data are considered in a coalescent context. Divergence times are estimated in the bird genus Aphelocoma to test whether speciation in these jays coincided with mountain uplift or glacial cycles. Gene trees and species trees show general agreement that diversification began in the Miocene amid mountain uplift. However, dates from the multilocus species tree are more recent, occurring predominately in the Pleistocene, consistent with theory that divergence times can be significantly overestimated with gene-tree based approaches that do not correct for genetic divergence that predates speciation. In addition to coalescent stochasticity, Haldane's rule could account for some differences in timing estimates between mitochondrial DNA and nuclear genes. By incorporating a fossil calibration applied to the species tree, in addition to the process of gene lineage coalescence, the present approach provides a more biologically realistic framework for dating speciation events, and hence for testing the links between diversification and specific biogeographic and geologic events.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/79292/1/j.1558-5646.2010.01097.x.pd
How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories
Reconstruction of population histories is a central problem in population
genetics. Existing coalescent-based methods, like the seminal work of Li and
Durbin (Nature, 2011), attempt to solve this problem using sequence data but
have no rigorous guarantees. Determining the amount of data needed to correctly
reconstruct population histories is a major challenge. Using a variety of tools
from information theory, the theory of extremal polynomials, and approximation
theory, we prove new sharp information-theoretic lower bounds on the problem of
reconstructing population structure -- the history of multiple subpopulations
that merge, split and change sizes over time. Our lower bounds are exponential
in the number of subpopulations, even when reconstructing recent histories. We
demonstrate the sharpness of our lower bounds by providing algorithms for
distinguishing and learning population histories with matching dependence on
the number of subpopulations. Along the way and of independent interest, we
essentially determine the optimal number of samples needed to learn an
exponential mixture distribution information-theoretically, proving the upper
bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201
BEAST 2:A Software Platform for Bayesian Evolutionary Analysis
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format
The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa
Evolution in Australasian Mangrove Forests: Multilocus Phylogenetic Analysis of the Gerygone Warblers (Aves: Acanthizidae)
The mangrove forests of Australasia have many endemic bird species but their
evolution and radiation in those habitats has been little studied. One genus
with several mangrove specialist species is Gerygone
(Passeriformes: Acanthizidae). The phylogeny of the Acanthizidae is reasonably
well understood but limited taxon sampling for Gerygone has
constrained understanding of its evolution and historical biogeography in
mangroves. Here we report on a phylogenetic analysis of
Gerygone based on comprehensive taxon sampling and a
multilocus dataset of thirteen loci spread across the avian genome (eleven
nuclear and two mitochondrial loci). Since Gerygone includes
three species restricted to Australia's coastal mangrove forests, we
particularly sought to understand the biogeography of their evolution in that
ecosystem. Analyses of individual loci, as well as of a concatenated dataset
drawn from previous molecular studies indicates that the genus as currently
defined is not monophyletic, and that the Grey Gerygone (G.
cinerea) from New Guinea should be transferred to the genus
Acanthiza. The multilocus approach has permitted the
nuanced view of the group's evolution into mangrove ecosystems having
occurred on multiple occasions, in three non-overlapping time frames, most
likely first by the G. magnirostris lineage, and subsequently
followed by those of G. tenebrosa and G.
levigaster
- …