6,796 research outputs found
A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree
In this paper, we provide a polynomial time algorithm to calculate the
probability of a {\it ranked} gene tree topology for a given species tree,
where a ranked tree topology is a tree topology with the internal vertices
being ordered. The probability of a gene tree topology can thus be calculated
in polynomial time if the number of orderings of the internal vertices is a
polynomial number. However, the complexity of calculating the probability of a
gene tree topology with an exponential number of rankings for a given species
tree remains unknown
Importance sampling for Lambda-coalescents in the infinitely many sites model
We present and discuss new importance sampling schemes for the approximate
computation of the sample probability of observed genetic types in the
infinitely many sites model from population genetics. More specifically, we
extend the 'classical framework', where genealogies are assumed to be governed
by Kingman's coalescent, to the more general class of Lambda-coalescents and
develop further Hobolth et. al.'s (2008) idea of deriving importance sampling
schemes based on 'compressed genetrees'. The resulting schemes extend earlier
work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner
and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance
comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures
Two-Locus Likelihoods under Variable Population Size and Fine-Scale Recombination Rate Estimation
Two-locus sampling probabilities have played a central role in devising an
efficient composite likelihood method for estimating fine-scale recombination
rates. Due to mathematical and computational challenges, these sampling
probabilities are typically computed under the unrealistic assumption of a
constant population size, and simulation studies have shown that resulting
recombination rate estimates can be severely biased in certain cases of
historical population size changes. To alleviate this problem, we develop here
new methods to compute the sampling probability for variable population size
functions that are piecewise constant. Our main theoretical result, implemented
in a new software package called LDpop, is a novel formula for the sampling
probability that can be evaluated by numerically exponentiating a large but
sparse matrix. This formula can handle moderate sample sizes () and
demographic size histories with a large number of epochs (). In addition, LDpop implements an approximate formula for the sampling
probability that is reasonably accurate and scales to hundreds in sample size
(). Finally, LDpop includes an importance sampler for the posterior
distribution of two-locus genealogies, based on a new result for the optimal
proposal distribution in the variable-size setting. Using our methods, we study
how a sharp population bottleneck followed by rapid growth affects the
correlation between partially linked sites. Then, through an extensive
simulation study, we show that accounting for population size changes under
such a demographic model leads to substantial improvements in fine-scale
recombination rate estimation. LDpop is freely available for download at
https://github.com/popgenmethods/ldpopComment: 32 pages, 13 figure
Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis
The multi-species coalescent provides an elegant theoretical framework for
estimating species trees and species demographics from genetic markers.
Practical applications of the multi-species coalescent model are, however,
limited by the need to integrate or sample over all gene trees possible for
each genetic marker. Here we describe a polynomial-time algorithm that computes
the likelihood of a species tree directly from the markers under a finite-sites
model of mutation, effectively integrating over all possible gene trees. The
method applies to independent (unlinked) biallelic markers such as well-spaced
single nucleotide polymorphisms (SNPs), and we have implemented it in SNAPP, a
Markov chain Monte-Carlo sampler for inferring species trees, divergence dates,
and population sizes. We report results from simulation experiments and from an
analysis of 1997 amplified fragment length polymorphism (AFLP) loci in 69
individuals sampled from six species of {\em Ourisia} (New Zealand native
foxglove)
Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories
Phylodynamics is an area of population genetics that uses genetic sequence
data to estimate past population dynamics. Modern state-of-the-art Bayesian
nonparametric methods for recovering population size trajectories of unknown
form use either change-point models or Gaussian process priors. Change-point
models suffer from computational issues when the number of change-points is
unknown and needs to be estimated. Gaussian process-based methods lack local
adaptivity and cannot accurately recover trajectories that exhibit features
such as abrupt changes in trend or varying levels of smoothness. We propose a
novel, locally-adaptive approach to Bayesian nonparametric phylodynamic
inference that has the flexibility to accommodate a large class of functional
behaviors. Local adaptivity results from modeling the log-transformed effective
population size a priori as a horseshoe Markov random field, a recently
proposed statistical model that blends together the best properties of the
change-point and Gaussian process modeling paradigms. We use simulated data to
assess model performance, and find that our proposed method results in reduced
bias and increased precision when compared to contemporary methods. We also use
our models to reconstruct past changes in genetic diversity of human hepatitis
C virus in Egypt and to estimate population size changes of ancient and modern
steppe bison. These analyses show that our new method captures features of the
population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
- âŠ