182 research outputs found
How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories
Reconstruction of population histories is a central problem in population
genetics. Existing coalescent-based methods, like the seminal work of Li and
Durbin (Nature, 2011), attempt to solve this problem using sequence data but
have no rigorous guarantees. Determining the amount of data needed to correctly
reconstruct population histories is a major challenge. Using a variety of tools
from information theory, the theory of extremal polynomials, and approximation
theory, we prove new sharp information-theoretic lower bounds on the problem of
reconstructing population structure -- the history of multiple subpopulations
that merge, split and change sizes over time. Our lower bounds are exponential
in the number of subpopulations, even when reconstructing recent histories. We
demonstrate the sharpness of our lower bounds by providing algorithms for
distinguishing and learning population histories with matching dependence on
the number of subpopulations. Along the way and of independent interest, we
essentially determine the optimal number of samples needed to learn an
exponential mixture distribution information-theoretically, proving the upper
bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201
Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake
The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here, we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700m diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet and trophic morphology. With whole genome sequences of 146 fish, we identify 98 clearly demarcated genomic ‘islands’ of high differentiation and demonstrate association of genotypes across these islands to divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight vision associated genes), hormone signaling and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.The work was funded by Royal Society-Leverhulme Trust Africa Awards AA100023 and AA130107 (M.J.G., B.P.N. and G.F.T.), a Wellcome Trust PhD studentship grant 097677/Z/11/Z (M.M.), Wellcome Trust grant WT098051 (S.S. and R.D.), Wellcome Trust and Cancer Research UK core support and a Wellcome Trust Senior Investigator Award (E.A.M.), a Leverhulme Trust Research Fellowship RF-2014-686 (M.J.G.), a University of Bristol Research Committee award (M.G.), a Bangor University Anniversary PhD studentship (to A.M.T.) and a Fisheries Society of the British Isles award (G.F.T.). Raw sequencing reads are in the SRA nucleotide archive: RAD sequencing (BioProject: PRJNA286304; accessions SAMN03768857 to SAMN03768912) and whole genome sequencing (BioProject PRJEB1254: sample accessions listed in Table S16). The RAD based phylogeny and alignments have been deposited in TreeBase (TB2:S18241). Whole genome variant calls in the VCF format, phylogenetic trees, and primer sequences for Sequenom genotyping are available from the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.770mc). RD declares his interests as a founder and non-executive director of Congenica Ltd., that he owns stock in Illumina from previous consulting, and is a scientific advisory board member of Dovetail Inc. We thank R. Schley for generating pharyngeal jaw data; S. Mzighani, J. Kihedu and staff of the Tanzanian Fisheries Research Institute for logistical support; A. Smith, H. Sungani, A. Shechonge, P. Parsons, J. Swanstrom, G. Cooke and J. Bridle for contributions to sampling and aquarium maintenance, the Sanger Institute sequencing core for DNA sequencing and Dr. H. Imai (Kyoto University) for the use of spectrometer in his laboratory.This is the author accepted manuscript. The final version is available from AAAS via http://dx.doi.org/10.1126/science.aac992
Iron Age and Anglo-Saxon genomes from East England reveal British migration history
British population history has been shaped by a series of immigrations, including the early Anglo-Saxon migrations after 400 CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences from 10 individuals excavated close to Cambridge in the East of England, ranging from the late Iron Age to the middle Anglo-Saxon period. By analysing shared rare variants with hundreds of modern samples from Britain and Europe, we estimate that on average the contemporary East English population derives 38% of its ancestry from Anglo-Saxon migrations. We gain further insight with a new method, rarecoal, which infers population history and identifies fine-scale genetic ancestry from rare variants. Using rarecoal we find that the Anglo-Saxon samples are closely related to modern Dutch and Danish populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain
MSMC and MSMC2: the multiple sequentially markovian coalescent
The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice
Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake.
The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700 meters in diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet, and trophic morphology. With whole-genome sequences of 146 fish, we identified 98 clearly demarcated genomic "islands" of high differentiation and demonstrated the association of genotypes across these islands with divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight-vision-associated genes), hormone signaling, and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.The work was funded by Royal Society-Leverhulme Trust Africa Awards AA100023 and AA130107 (M.J.G., B.P.N. and G.F.T.), a Wellcome Trust PhD studentship grant 097677/Z/11/Z (M.M.), Wellcome Trust grant WT098051 (S.S. and R.D.), Wellcome Trust and Cancer Research UK core support and a Wellcome Trust Senior Investigator Award (E.A.M.), a Leverhulme Trust Research Fellowship RF-2014-686 (M.J.G.), a University of Bristol Research Committee award (M.G.), a Bangor University Anniversary PhD studentship (to A.M.T.) and a Fisheries Society of the British Isles award (G.F.T.). Raw sequencing reads are in the SRA nucleotide archive: RAD sequencing (BioProject: PRJNA286304; accessions SAMN03768857 to SAMN03768912) and whole genome sequencing (BioProject PRJEB1254: sample accessions listed in Table S16). The RAD based phylogeny and alignments have been deposited in TreeBase (TB2:S18241). Whole genome variant calls in the VCF format, phylogenetic trees, and primer sequences for Sequenom genotyping are available from the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.770mc). RD declares his interests as a founder and non-executive director of Congenica Ltd., that he owns stock in Illumina from previous consulting, and is a scientific advisory board member of Dovetail Inc. We thank R. Schley for generating pharyngeal jaw data; S. Mzighani, J. Kihedu and staff of the Tanzanian Fisheries Research Institute for logistical support; A. Smith, H. Sungani, A. Shechonge, P. Parsons, J. Swanstrom, G. Cooke and J. Bridle for contributions to sampling and aquarium maintenance, the Sanger Institute sequencing core for DNA sequencing and Dr. H. Imai (Kyoto University) for the use of spectrometer in his laboratory.This is the author accepted manuscript. The final version is available from AAAS via http://dx.doi.org/10.1126/science.aac992
Genomic signatures of population decline in the malaria mosquito Anopheles gambiae
Population genomic features such as nucleotide diversity and linkage disequilibrium are expected to be strongly shaped by changes in population size, and might therefore be useful for monitoring the success of a control campaign. In the Kilifi district of Kenya, there has been a marked decline in the abundance of the malaria vector Anopheles gambiae subsequent to the rollout of insecticide-treated bed nets. To investigate whether this decline left a detectable population genomic signature, simulations were performed to compare the effect of population crashes on nucleotide diversity, Tajima's D, and linkage disequilibrium (as measured by the population recombination parameter ρ). Linkage disequilibrium and ρ were estimated for An. gambiae from Kilifi, and compared them to values for Anopheles arabiensis and Anopheles merus at the same location, and for An. gambiae in a location 200 km from Kilifi. In the first simulations ρ changed more rapidly after a population crash than the other statistics, and therefore is a more sensitive indicator of recent population decline. In the empirical data, linkage disequilibrium extends 100-1000 times further, and ρ is 100-1000 times smaller, for the Kilifi population of An. gambiae than for any of the other populations. There were also significant runs of homozygosity in many of the individual An. gambiae mosquitoes from Kilifi. These results support the hypothesis that the recent decline in An. gambiae was driven by the rollout of bed nets. Measuring population genomic parameters in a small sample of individuals before, during and after vector or pest control may be a valuable method of tracking the effectiveness of interventions
Dynamic changes in genomic and social structures in third millennium BCE central Europe
Europe’s prehistory oversaw dynamic and complex interactions of diverse societies, hitherto unexplored at detailed regional scales. Studying 271 human genomes dated ~4900 to 1600 BCE from the European heartland, Bohemia, we reveal unprecedented genetic changes and social processes. Major migrations preceded the arrival of “steppe” ancestry, and at ~2800 BCE, three genetically and culturally differentiated groups coexisted. Corded Ware appeared by 2900 BCE, were initially genetically diverse, did not derive all steppe ancestry from known Yamnaya, and assimilated females of diverse backgrounds. Both Corded Ware and Bell Beaker groups underwent dynamic changes, involving sharp reductions and complete replacements of Y-chromosomal diversity at ~2600 and ~2400 BCE, respectively, the latter accompanied by increased Neolithic-like ancestry. The Bronze Age saw new social organization emerge amid a ≥40% population turnover.Introduction Results - General sample overview - Bohemia before Corded Ware (pre-CW, before ~2800 BCE) - Corded Ware - Bell Beaker - EBA—Únětice culture Discussion Materials and methods - Processing sites for the newly reported individuals - Sampling - DNA extraction - DNA libraries and in-solution capture - Sequencing - Sex determination and authentication - Genotyping - Mitochondrial and Y chromosome haplogroups - Principal components analysis - Ancestry decomposition and admixture modeling - Y haplogroup frequency simulation
Processing and analyzing multiple genomes alignments with MafFilter
As the number of available genome sequences from both closely related species and individuals withinspecies increased, theoretical and methodological convergences between the fields of phylogenomics andpopulation genomics emerged. Population genomics typically focuses on the analysis of variants, whilephylogenomics heavily relies on genome alignments. However, these are playing an increasingly importantrole in studies at the population level. Multiple genome alignments of individuals are used when structuralvariation is of primary interest and when genome architecture permits to assemblede novogenomesequences. Here I describe MafFilter, a command-line-driven program allowing to process genome align-ments in the Multiple Alignment Format (MAF). Using concrete examples based on publicly availabledatasets, I demonstrate how MafFilter can be used to develop efficient and reproducible pipelines withquality assurance for downstream analyses. I further show how MafFilter can be used to perform both basicand advanced population genomic analyses in order to infer the patterns of nucleotide diversity alonggenomes
The genetic history of admixture across inner Eurasia
This is the author accepted manuscript. The final version is available from Nature Research via the DOI in this record.Data Availability. Genome-wide sequence data of two Botai individuals (BAM format) are available at the European Nucleotide Archive under the accession number PRJEB31152 (ERP113669). Eigenstrat format array genotype data of 763 present-day individuals and 1240K pulldown genotype data of two ancient Botai individuals are available at the Edmond data repository of the Max Planck Society
(https://edmond.mpdl.mpg.de/imeji/collection/Aoh9c69DscnxSNjm?q=).The indigenous populations of inner Eurasia, a huge geographic region covering the central Eurasian steppe and the northern Eurasian taiga and tundra, harbor tremendous diversity in their genes, cultures and languages. In this study, we report novel genome-wide data for 763 individuals from Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan. We furthermore report additional damage-reduced genome-wide data of two previously published individuals from the Eneolithic Botai culture in Kazakhstan (~5,400 BP). We find that present-day inner Eurasian populations are structured into three distinct admixture clines stretching between various western and eastern Eurasian ancestries, mirroring geography. The Botai and more recent ancient genomes from Siberia show a decrease in contribution from so-called “ancient North Eurasian” ancestry over time, detectable only in the northern-most “forest-tundra” cline. The intermediate “steppe-forest” cline descends from the Late Bronze Age steppe ancestries, while the “southern steppe” cline further to the South shows a strong West/South Asian influence. Ancient genomes suggest a northward spread of the southern steppe cline in Central Asia during the first millennium BC. Finally, the genetic structure of Caucasus populations highlights a role of the Caucasus Mountains as a barrier to gene flow and suggests a post-Neolithic gene flow into North
Caucasus populations from the steppe.Max Planck SocietyEuropean Research Council (ERC)Russian Foundation for Basic Research (RFBR)Russian Scientific FundNational Science FoundationU.S. National Institutes of HealthAllen Discovery CenterUniversity of OstravaCzech Ministry of EducationXiamen UniversityFundamental Research Funds for the Central UniversitiesMES R
- …