46 research outputs found
Analysis of DNA sequence variation within marine species using Beta-coalescents
We apply recently developed inference methods based on general coalescent
processes to DNA sequence data obtained from various marine species. Several of
these species are believed to exhibit so-called shallow gene genealogies,
potentially due to extreme reproductive behaviour, e.g. via Hedgecock's
"reproduction sweepstakes". Besides the data analysis, in particular the
inference of mutation rates and the estimation of the (real) time to the most
recent common ancestor, we briefly address the question whether the genealogies
might be adequately described by so-called Beta coalescents (as opposed to
Kingman's coalescent), allowing multiple mergers of genealogies.
The choice of the underlying coalescent model for the genealogy has drastic
implications for the estimation of the above quantities, in particular the
real-time embedding of the genealogy.Comment: 15 pages, 16 figure
The Effects of Population Size Histories on Estimates of Selection Coefficients from Time-Series Genetic Data.
Many approaches have been developed for inferring selection coefficients from time series data while accounting for genetic drift. These approaches have been motivated by the intuition that properly accounting for the population size history can significantly improve estimates of selective strengths. However, the improvement in inference accuracy that can be attained by modeling drift has not been characterized. Here, by comparing maximum likelihood estimates of selection coefficients that account for the true population size history with estimates that ignore drift by assuming allele frequencies evolve deterministically in a population of infinite size, we address the following questions: how much can modeling the population size history improve estimates of selection coefficients? How much can mis-inferred population sizes hurt inferences of selection coefficients? We conduct our analysis under the discrete Wright-Fisher model by deriving the exact probability of an allele frequency trajectory in a population of time-varying size and we replicate our results under the diffusion model. For both models, we find that ignoring drift leads to estimates of selection coefficients that are nearly as accurate as estimates that account for the true population history, even when population sizes are small and drift is high. This result is of interest because inference methods that ignore drift are widely used in evolutionary studies and can be many orders of magnitude faster than methods that account for population sizes
A novel spectral method for inferring general diploid selection from time series genetic data
The increased availability of time series genetic variation data from
experimental evolution studies and ancient DNA samples has created new
opportunities to identify genomic regions under selective pressure and to
estimate their associated fitness parameters. However, it is a challenging
problem to compute the likelihood of nonneutral models for the population
allele frequency dynamics, given the observed temporal DNA data. Here, we
develop a novel spectral algorithm to analytically and efficiently integrate
over all possible frequency trajectories between consecutive time points. This
advance circumvents the limitations of existing methods which require
fine-tuning the discretization of the population allele frequency space when
numerically approximating requisite integrals. Furthermore, our method is
flexible enough to handle general diploid models of selection where the
heterozygote and homozygote fitness parameters can take any values, while
previous methods focused on only a few restricted models of selection. We
demonstrate the utility of our method on simulated data and also apply it to
analyze ancient DNA data from genetic loci associated with coat coloration in
horses. In contrast to previous studies, our exploration of the full fitness
parameter space reveals that a heterozygote advantage form of balancing
selection may have been acting on these loci.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS764 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Inference of Population History using Coalescent HMMs: Review and Outlook
Studying how diverse human populations are related is of historical and
anthropological interest, in addition to providing a realistic null model for
testing for signatures of natural selection or disease associations.
Furthermore, understanding the demographic histories of other species is
playing an increasingly important role in conservation genetics. A number of
statistical methods have been developed to infer population demographic
histories using whole-genome sequence data, with recent advances focusing on
allowing for more flexible modeling choices, scaling to larger data sets, and
increasing statistical power. Here we review coalescent hidden Markov models, a
powerful class of population genetic inference methods that can effectively
utilize linkage disequilibrium information. We highlight recent advances, give
advice for practitioners, point out potential pitfalls, and present possible
future research directions.Comment: 12 pages, 2 figure
Importance sampling for Lambda-coalescents in the infinitely many sites model
We present and discuss new importance sampling schemes for the approximate
computation of the sample probability of observed genetic types in the
infinitely many sites model from population genetics. More specifically, we
extend the 'classical framework', where genealogies are assumed to be governed
by Kingman's coalescent, to the more general class of Lambda-coalescents and
develop further Hobolth et. al.'s (2008) idea of deriving importance sampling
schemes based on 'compressed genetrees'. The resulting schemes extend earlier
work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner
and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance
comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures
Genomic evidence for the Pleistocene and recent population history of Native Americans
This is the authorâs version of the work. It is posted here by permission of the AAAS for personal use, not for redistribution. The definitive version was published in Science on 2015 August 21; 349(6250), DOI: 10.1126/science.aab3884.How and when the Americas were populated remains contentious. Using ancient and modern genome-wide data, we find that the ancestors of all present-day Native Americans, including Athabascans and Amerindians, entered the Americas as a single migration wave from Siberia no earlier than 23 thousand years ago (KYA), and after no more than 8,000-year isolation period in Beringia. Following their arrival to the Americas, ancestral Native Americans diversified into two basal genetic branches around 13 KYA, one that is now dispersed across North and South America and the other is restricted to North America. Subsequent gene flow resulted in some Native Americans sharing ancestry with present-day East Asians (including Siberians) and, more distantly, Australo-Melanesians. Putative âPaleoamericanâ relict populations, including the historical Mexican PericĂșes and South American Fuego-Patagonians, are not directly related to modern Australo-Melanesians as suggested by the Paleoamerican Model
Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders.
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversionsretrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 Ă 1
Koaleszenten mit Multiplen Verschmelzungen und Populationsgenetische Inferenz
Mathematische Populationsgenetik beschĂ€ftigt sich mit der Konstruktion von Modellen um biologisch interessante PhĂ€nomene zu beschreiben und zu analysieren. Der klassische Fleming-Viot Prozess beschreibt die Evolution einer Population in vorwĂ€rtsgerichteter Zeit. Er ist dual zu Kingman's Koaleszenten, welcher die Genealogie einer Stichprobe von Individuen, also die RĂŒckwĂ€rtsdynamik beschreibt. Pitman (1999) und Sagitov (1999) haben eine allgemeinere Klasse von Koaleszenten eingefĂŒhrt, die im Gegensatz zu Kingman's Koaleszent das Verschmelzen von mehr als zwei Ahnenlinien auf einmal erlaubt. Diese Lambda-Koaleszenten sind dual zu den Lambda-Fleming-Viot Prozessen. Donnelly und Kurtz (1996 & 1999) fĂŒhrten die sogenannte Lookdown-Konstruktion ein um zu zeigen, das diese DualitĂ€ten auch in einem pfadweisen Sinne gelten. In der vorliegenden Arbeit erweitern wir diese Lookdown-Konstruktion um damit den Xi-Fleming-Viot Prozess zu konstruieren und seine pfadweise DualitĂ€t zum Xi-Koaleszenten zu etablieren. Der Xi-Koaleszent zeichnet sich dadurch aus, dass er das simultane Verschmelzen von mehren Ahnenlinen erlaubt. Der zweite Teil der Arbeit beschĂ€ftgt sich mit der statistischen Inferenz von evolutionĂ€ren Parametern unter dem Lambda-Koaleszenten und dem sogenannten infinitely-many-sites Mutationsmodel. Wir prĂ€sentieren Rekursionformeln mit denen sich die Wahrscheinlichkeit eine gegebene Stichprobe von der Population zu ziehen berechnen lĂ€sst (eingefĂŒhrt von Birkner und Blath (2008)). Mit Hilfe dieser Rekursion lĂ€sst sich ein Monte-Carlo-Simulations-Verfahren beschreiben, das zur SchĂ€tzung der Wahrscheinlichkeit fĂŒr groĂe Stichproben benutzt werden kann. Aufbauend auf der Arbeit von Stephens und Donnelly (2000) und Hobolth et. al. (2008) interpretieren wir diese Methode als ein "importance sampling"-Schema und fĂŒhren weitere Schemata ein, um die Genauigkeit der SchĂ€tzungen zu erhöhen. Wir wenden diese Verfahren an, um evolutionĂ€re Parameter fĂŒr reale DatensĂ€tze mitochondrialer DNA vom atlantischen Kabeljau zu bestimmen. Wir argumentieren, dass die Lambda-Koaleszenten besser geeignet sind die Genealogien der DatensĂ€tze zu beschreiben als der klassische Kingman-Koaleszent.A fundamental goal in mathematical population genetics is the construction of population models to describe and analyse certain phenomena which are of interest for biological applications. The classical Fleming-Viot process describes the evolution of a population forward in time. It is dual to Kingman's coalescent, which describes the genealogy of a sample of individuals backwards in time. Pitman (1999) and Sagitov (1999) introduced a more general class of coalescent processes, allowing for multiple coalescence of ancestral lines, whereas only binary merging is possible in Kingman's coalescent. These Lambda-coalescents are dual to the so-called Lambda-Fleming-Viot processes. Donnelly and Kurtz (1996 & 1999) introduced the lookdown construction to show that these dualities also hold in a pathwise sense. In this work we extend the lookdown construction to the Xi-Fleming-Viot process and establish its pathwise duality to the Xi-coalescent, a coalescent process which allows for simultaneous multiple merging of ancestral lines. The second part of this work deals with statistical inference of evolutionary parameters under the Lambda-coalescent and the infinitely-many-sites mutation model. We present recursive formulae that can be used to compute the probability to obtain a certain sample from the population at stationarity (these recursions where introduced by Birkner and Blath (2008)). Based on these recursions we can derive a Monte-Carlo-method that can be used to estimate the sampling probability for large sample sizes. Extending the work of Stephens and Donnelly (2000) and Hobolth et. al. (2008) we interpret this method as an importance sampling scheme and develop additional schemes to improve the performance of the method. We apply our developed methods to infer evolutionary parameters for mitochondrial datasets taken from Atlantic Cod. We argue that the Lambda-coalescents are more suitable to describe the genealogies underlying the data then Kingman's coalescent