182 research outputs found

    How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

    Full text link
    Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure -- the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201

    Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake

    Get PDF
    The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here, we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700m diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet and trophic morphology. With whole genome sequences of 146 fish, we identify 98 clearly demarcated genomic ‘islands’ of high differentiation and demonstrate association of genotypes across these islands to divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight vision associated genes), hormone signaling and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.The work was funded by Royal Society-Leverhulme Trust Africa Awards AA100023 and AA130107 (M.J.G., B.P.N. and G.F.T.), a Wellcome Trust PhD studentship grant 097677/Z/11/Z (M.M.), Wellcome Trust grant WT098051 (S.S. and R.D.), Wellcome Trust and Cancer Research UK core support and a Wellcome Trust Senior Investigator Award (E.A.M.), a Leverhulme Trust Research Fellowship RF-2014-686 (M.J.G.), a University of Bristol Research Committee award (M.G.), a Bangor University Anniversary PhD studentship (to A.M.T.) and a Fisheries Society of the British Isles award (G.F.T.). Raw sequencing reads are in the SRA nucleotide archive: RAD sequencing (BioProject: PRJNA286304; accessions SAMN03768857 to SAMN03768912) and whole genome sequencing (BioProject PRJEB1254: sample accessions listed in Table S16). The RAD based phylogeny and alignments have been deposited in TreeBase (TB2:S18241). Whole genome variant calls in the VCF format, phylogenetic trees, and primer sequences for Sequenom genotyping are available from the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.770mc). RD declares his interests as a founder and non-executive director of Congenica Ltd., that he owns stock in Illumina from previous consulting, and is a scientific advisory board member of Dovetail Inc. We thank R. Schley for generating pharyngeal jaw data; S. Mzighani, J. Kihedu and staff of the Tanzanian Fisheries Research Institute for logistical support; A. Smith, H. Sungani, A. Shechonge, P. Parsons, J. Swanstrom, G. Cooke and J. Bridle for contributions to sampling and aquarium maintenance, the Sanger Institute sequencing core for DNA sequencing and Dr. H. Imai (Kyoto University) for the use of spectrometer in his laboratory.This is the author accepted manuscript. The final version is available from AAAS via http://dx.doi.org/10.1126/science.aac992

    Iron Age and Anglo-Saxon genomes from East England reveal British migration history

    Get PDF
    British population history has been shaped by a series of immigrations, including the early Anglo-Saxon migrations after 400 CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences from 10 individuals excavated close to Cambridge in the East of England, ranging from the late Iron Age to the middle Anglo-Saxon period. By analysing shared rare variants with hundreds of modern samples from Britain and Europe, we estimate that on average the contemporary East English population derives 38% of its ancestry from Anglo-Saxon migrations. We gain further insight with a new method, rarecoal, which infers population history and identifies fine-scale genetic ancestry from rare variants. Using rarecoal we find that the Anglo-Saxon samples are closely related to modern Dutch and Danish populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain

    MSMC and MSMC2: the multiple sequentially markovian coalescent

    Get PDF
    The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice

    Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake.

    Get PDF
    The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700 meters in diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet, and trophic morphology. With whole-genome sequences of 146 fish, we identified 98 clearly demarcated genomic "islands" of high differentiation and demonstrated the association of genotypes across these islands with divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight-vision-associated genes), hormone signaling, and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.The work was funded by Royal Society-Leverhulme Trust Africa Awards AA100023 and AA130107 (M.J.G., B.P.N. and G.F.T.), a Wellcome Trust PhD studentship grant 097677/Z/11/Z (M.M.), Wellcome Trust grant WT098051 (S.S. and R.D.), Wellcome Trust and Cancer Research UK core support and a Wellcome Trust Senior Investigator Award (E.A.M.), a Leverhulme Trust Research Fellowship RF-2014-686 (M.J.G.), a University of Bristol Research Committee award (M.G.), a Bangor University Anniversary PhD studentship (to A.M.T.) and a Fisheries Society of the British Isles award (G.F.T.). Raw sequencing reads are in the SRA nucleotide archive: RAD sequencing (BioProject: PRJNA286304; accessions SAMN03768857 to SAMN03768912) and whole genome sequencing (BioProject PRJEB1254: sample accessions listed in Table S16). The RAD based phylogeny and alignments have been deposited in TreeBase (TB2:S18241). Whole genome variant calls in the VCF format, phylogenetic trees, and primer sequences for Sequenom genotyping are available from the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.770mc). RD declares his interests as a founder and non-executive director of Congenica Ltd., that he owns stock in Illumina from previous consulting, and is a scientific advisory board member of Dovetail Inc. We thank R. Schley for generating pharyngeal jaw data; S. Mzighani, J. Kihedu and staff of the Tanzanian Fisheries Research Institute for logistical support; A. Smith, H. Sungani, A. Shechonge, P. Parsons, J. Swanstrom, G. Cooke and J. Bridle for contributions to sampling and aquarium maintenance, the Sanger Institute sequencing core for DNA sequencing and Dr. H. Imai (Kyoto University) for the use of spectrometer in his laboratory.This is the author accepted manuscript. The final version is available from AAAS via http://dx.doi.org/10.1126/science.aac992

    Genomic signatures of population decline in the malaria mosquito Anopheles gambiae

    Get PDF
    Population genomic features such as nucleotide diversity and linkage disequilibrium are expected to be strongly shaped by changes in population size, and might therefore be useful for monitoring the success of a control campaign. In the Kilifi district of Kenya, there has been a marked decline in the abundance of the malaria vector Anopheles gambiae subsequent to the rollout of insecticide-treated bed nets. To investigate whether this decline left a detectable population genomic signature, simulations were performed to compare the effect of population crashes on nucleotide diversity, Tajima's D, and linkage disequilibrium (as measured by the population recombination parameter ρ). Linkage disequilibrium and ρ were estimated for An. gambiae from Kilifi, and compared them to values for Anopheles arabiensis and Anopheles merus at the same location, and for An. gambiae in a location 200 km from Kilifi. In the first simulations ρ changed more rapidly after a population crash than the other statistics, and therefore is a more sensitive indicator of recent population decline. In the empirical data, linkage disequilibrium extends 100-1000 times further, and ρ is 100-1000 times smaller, for the Kilifi population of An. gambiae than for any of the other populations. There were also significant runs of homozygosity in many of the individual An. gambiae mosquitoes from Kilifi. These results support the hypothesis that the recent decline in An. gambiae was driven by the rollout of bed nets. Measuring population genomic parameters in a small sample of individuals before, during and after vector or pest control may be a valuable method of tracking the effectiveness of interventions

    Dynamic changes in genomic and social structures in third millennium BCE central Europe

    Get PDF
    Europe’s prehistory oversaw dynamic and complex interactions of diverse societies, hitherto unexplored at detailed regional scales. Studying 271 human genomes dated ~4900 to 1600 BCE from the European heartland, Bohemia, we reveal unprecedented genetic changes and social processes. Major migrations preceded the arrival of “steppe” ancestry, and at ~2800 BCE, three genetically and culturally differentiated groups coexisted. Corded Ware appeared by 2900 BCE, were initially genetically diverse, did not derive all steppe ancestry from known Yamnaya, and assimilated females of diverse backgrounds. Both Corded Ware and Bell Beaker groups underwent dynamic changes, involving sharp reductions and complete replacements of Y-chromosomal diversity at ~2600 and ~2400 BCE, respectively, the latter accompanied by increased Neolithic-like ancestry. The Bronze Age saw new social organization emerge amid a ≥40% population turnover.Introduction Results - General sample overview - Bohemia before Corded Ware (pre-CW, before ~2800 BCE) - Corded Ware - Bell Beaker - EBA—Únětice culture Discussion Materials and methods - Processing sites for the newly reported individuals - Sampling - DNA extraction - DNA libraries and in-solution capture - Sequencing - Sex determination and authentication - Genotyping - Mitochondrial and Y chromosome haplogroups - Principal components analysis - Ancestry decomposition and admixture modeling - Y haplogroup frequency simulation

    Processing and analyzing multiple genomes alignments with MafFilter

    Get PDF
    As the number of available genome sequences from both closely related species and individuals withinspecies increased, theoretical and methodological convergences between the fields of phylogenomics andpopulation genomics emerged. Population genomics typically focuses on the analysis of variants, whilephylogenomics heavily relies on genome alignments. However, these are playing an increasingly importantrole in studies at the population level. Multiple genome alignments of individuals are used when structuralvariation is of primary interest and when genome architecture permits to assemblede novogenomesequences. Here I describe MafFilter, a command-line-driven program allowing to process genome align-ments in the Multiple Alignment Format (MAF). Using concrete examples based on publicly availabledatasets, I demonstrate how MafFilter can be used to develop efficient and reproducible pipelines withquality assurance for downstream analyses. I further show how MafFilter can be used to perform both basicand advanced population genomic analyses in order to infer the patterns of nucleotide diversity alonggenomes

    The genetic history of admixture across inner Eurasia

    Get PDF
    This is the author accepted manuscript. The final version is available from Nature Research via the DOI in this record.Data Availability. Genome-wide sequence data of two Botai individuals (BAM format) are available at the European Nucleotide Archive under the accession number PRJEB31152 (ERP113669). Eigenstrat format array genotype data of 763 present-day individuals and 1240K pulldown genotype data of two ancient Botai individuals are available at the Edmond data repository of the Max Planck Society (https://edmond.mpdl.mpg.de/imeji/collection/Aoh9c69DscnxSNjm?q=).The indigenous populations of inner Eurasia, a huge geographic region covering the central Eurasian steppe and the northern Eurasian taiga and tundra, harbor tremendous diversity in their genes, cultures and languages. In this study, we report novel genome-wide data for 763 individuals from Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan. We furthermore report additional damage-reduced genome-wide data of two previously published individuals from the Eneolithic Botai culture in Kazakhstan (~5,400 BP). We find that present-day inner Eurasian populations are structured into three distinct admixture clines stretching between various western and eastern Eurasian ancestries, mirroring geography. The Botai and more recent ancient genomes from Siberia show a decrease in contribution from so-called “ancient North Eurasian” ancestry over time, detectable only in the northern-most “forest-tundra” cline. The intermediate “steppe-forest” cline descends from the Late Bronze Age steppe ancestries, while the “southern steppe” cline further to the South shows a strong West/South Asian influence. Ancient genomes suggest a northward spread of the southern steppe cline in Central Asia during the first millennium BC. Finally, the genetic structure of Caucasus populations highlights a role of the Caucasus Mountains as a barrier to gene flow and suggests a post-Neolithic gene flow into North Caucasus populations from the steppe.Max Planck SocietyEuropean Research Council (ERC)Russian Foundation for Basic Research (RFBR)Russian Scientific FundNational Science FoundationU.S. National Institutes of HealthAllen Discovery CenterUniversity of OstravaCzech Ministry of EducationXiamen UniversityFundamental Research Funds for the Central UniversitiesMES R
    corecore