38 research outputs found
Population genetics of identity by descent
Recent improvements in high-throughput genotyping and sequencing technologies
have afforded the collection of massive, genome-wide datasets of DNA
information from hundreds of thousands of individuals. These datasets, in turn,
provide unprecedented opportunities to reconstruct the history of human
populations and detect genotype-phenotype association. Recently developed
computational methods can identify long-range chromosomal segments that are
identical across samples, and have been transmitted from common ancestors that
lived tens to hundreds of generations in the past. These segments reveal
genealogical relationships that are typically unknown to the carrying
individuals. In this work, we demonstrate that such identical-by-descent (IBD)
segments are informative about a number of relevant population genetics
features: they enable the inference of details about past population size
fluctuations, migration events, and they carry the genomic signature of natural
selection. We derive a mathematical model, based on coalescent theory, that
allows for a quantitative description of IBD sharing across purportedly
unrelated individuals, and develop inference procedures for the reconstruction
of recent demographic events, where classical methodologies are statistically
underpowered. We analyze IBD sharing in several contemporary human populations,
including representative communities of the Jewish Diaspora, Kenyan Maasai
samples, and individuals from several Dutch provinces, in all cases retrieving
evidence of fine-scale demographic events from recent history. Finally, we
expand the presented model to describe distributions for those sites in IBD
shared segments that harbor mutation events, showing how these may be used for
the inference of mutation rates in humans and other species.Comment: Ph.D. thesi
The variance of identity-by-descent sharing in the Wright-Fisher model
Widespread sharing of long, identical-by-descent (IBD) genetic segments is a
hallmark of populations that have experienced recent genetic drift. Detection
of these IBD segments has recently become feasible, enabling a wide range of
applications from phasing and imputation to demographic inference. Here, we
study the distribution of IBD sharing in the Wright-Fisher model. Specifically,
using coalescent theory, we calculate the variance of the total sharing between
random pairs of individuals. We then investigate the cohort-averaged sharing:
the average total sharing between one individual and the rest of the cohort. We
find that for large cohorts, the cohort-averaged sharing is distributed
approximately normally. Surprisingly, the variance of this distribution does
not vanish even for large cohorts, implying the existence of "hyper-sharing"
individuals. The presence of such individuals has consequences for the design
of sequencing studies, since, if they are selected for whole-genome sequencing,
a larger fraction of the cohort can be subsequently imputed. We calculate the
expected gain in power of imputation by IBD, and subsequently, in power to
detect an association, when individuals are either randomly selected or
specifically chosen to be the hyper-sharing individuals. Using our framework,
we also compute the variance of an estimator of the population size that is
based on the mean IBD sharing and the variance in the sharing between inbred
siblings. Finally, we study IBD sharing in an admixture pulse model, and show
that in the Ashkenazi Jewish population the admixture fraction is correlated
with the cohort-averaged sharing.Comment: Includes Supplementary Materia
Length Distributions of Identity by Descent Reveal Fine-Scale Demographic History
Data-driven studies of identity by descent (IBD) were recently enabled by high-resolution genomic data from large cohorts and scalable algorithms for IBD detection. Yet, haplotype sharing currently represents an underutilized source of information for population-genetics research. We present analytical results on the relationship between haplotype sharing across purportedly unrelated individuals and a populationâs demographic history. We express the distribution of IBD sharing across pairs of individuals for segments of arbitrary length as a function of the populationâs demography, and we derive an inference procedure to reconstruct such demographic history. The accuracy of the proposed reconstruction methodology was extensively tested on simulated data. We applied this methodology to two densely typed data sets: 500 Ashkenazi Jewish (AJ) individuals and 56 Kenyan Maasai (MKK) individuals (HapMap 3 data set). Reconstructing the demographic history of the AJ cohort, we recovered two subsequent population expansions, separated by a severe founder event, consistent with previous analysis of lower-throughput genetic data and historical accounts of AJ history. In the MKK cohort, high levels of cryptic relatedness were detected. The spectrum of IBD sharing is consistent with a demographic model in which several small-sized demes intermix through high migration rates and result in enrichment of shared long-range haplotypes. This scenario of historically structured demographies might explain the unexpected abundance of runs of homozygosity within several populations
Recommended from our members
Fast and accurate long-range phasing in a UK Biobank cohort
Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here, we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to Nâ150,000 samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1â2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate â0.3%, corresponding to perfect phase in a majority of 10Mb segments). We also observed that when used within an imputation pipeline, Eagle pre-phasing improved downstream imputation accuracy compared to pre-phasing in batches using existing methods (as necessary to achieve comparable computational cost)
Linkage disequilibrium dependent architecture of human complex traits reveals action of negative selection
Recent work has hinted at the linkage disequilibrium (LD)-dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability. Here we analyzed summary statistics from 56 complex traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs). The remaining signal is largely driven by our finding that more recent common variants tend to have lower LLD and to explain more heritability (P = 2.38 Ă 10â104); the youngest 20% of common SNPs explain 3.9 times more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that they jointly predict deleterious effects
A minimal descriptor of an ancestral recombinations graph
<p>Abstract</p> <p>Background</p> <p>Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations.</p> <p>Results</p> <p>In this paper, we identify structure-preserving and samples-preserving core of an ARG <it>G</it> and call it the minimal descriptor ARG of <it>G</it>. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of <it>G</it> and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of <it>G</it>. We also prove that even an unbounded <it>G</it> has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of <it>G</it> and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential.</p> <p>Conclusions</p> <p>Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).</p
Recommended from our members
Improved imputation quality of low-frequency and rare variants in European samples using the âGenome of The Netherlands'
Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with âtrue' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05â0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r2, increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r2 improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r2 increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results
WGS-based telomere length analysis in Dutch family trios implicates stronger maternal inheritance and a role for RRM1 gene
Telomere length (TL) regulation is an important factor in ageing, reproduction and cancer development. Genetic, hereditary and environmental factors regulating TL are currently widely investigated, however, their relative contribution to TL variability is still understudied. We have used whole genome sequencing data of 250 family trios from the Genome of the Netherlands project to perform computational measurement of TL and a series of regression and genome-wide association analyses to reveal TL inheritance patterns and associated genetic factors. Our results confirm that TL is a largely heritable trait, primarily with motherâs, and, to a lesser extent, with fatherâs TL having the strongest influence on the offspring. In this cohort, motherâs, but not fatherâs age at conception was positively linked to offspring TL. Age-related TL attrition of 40 bp/year had relatively small influence on TL variability. Finally, we have identified TL-associated variations in ribonuclease reductase catalytic subunit M1 (RRM1 gene), which is known to regulate telomere maintenance in yeast. We also highlight the importance of multivariate approach and the limitations of existing tools for the analysis of TL as a polygenic heritable quantitative trait
Recommended from our members
A framework for the detection of de novo mutations in family-based sequencing data
Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports
Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articleâs Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. Acknowledgements: We especially thank all volunteers who participated in our study. This study made use of data generated by the âGenome of the Netherlandsâ project, which is funded by the Netherlands Organization for Scientific Research (grant no. 184021007). The data were made available as a Rainbow Project of BBMRI-NL. Samples were contributed by LifeLines (http://lifelines.nl/lifelines-research/general), the Leiden Longevity Study (http://www.healthy-ageing.nl; http://www.langleven.net), the Netherlands Twin Registry (NTR: http://www.tweelingenregister.org), the Rotterdam studies (http://www.erasmus-epidemiology.nl/rotterdamstudy) and the Genetic Research in Isolated Populations programme (http://www.epib.nl/research/geneticepi/research.html#gip). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). Cardiovascular Health Study: This CHS research was supported by NHLBI contracts HHSN268201200036C, HHSN268200800007C, HHSN268200960009C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants HL080295, HL087652, HL105756 and HL103612 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through AG023629 from the National Institute on Aging (NIA). A full list of CHS investigators and institutions can be found at http://www.chs-nhlbi.org/pi.htm. The CROATIA cohorts would like to acknowledge the invaluable contributions of the recruitment teams in Vis, Korcula and Split (including those from the Institute of Anthropological Research in Zagreb and the Croatian Centre for Global Health at the University of Split), the administrative teams in Croatia and Edinburgh and the people of Vis, Korcula and Split. SNP genotyping was performed at the Wellcome Trust Clinical Research Facility in Edinburgh for CROATIA-Vis, by Helmholtz Zentrum MĂŒnchen, GmbH, Neuherberg, Germany for CROATIA-Korcula and by AROS Applied Biotechnology, Aarhus, Denmark for CROATIA-Split. They would also like to thank Jared OâConnell for performing the pre-phasing for all cohorts before imputation. The ERF study as a part of EuroSPAN (European Special Populations Research Network) was supported by European Commission FP-6 STRP grant number 018947 (LSHG-CT-2006-01947) and also received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)/grant agreement HEALTH-F4-2007-201413 by the European Commission under the programme âQuality of Life and Management of the Living Resourcesâ of 5th Framework Programme (no. QLG2-CT-2002-01254). High-throughput analysis of the ERF data was supported by joint grant from the Netherlands Organisation for Scientific Research and the Russian Foundation for Basic Research (NWO-RFBR 047.017.043). This research was financially supported by BBMRI-NL, a Research Infrastructure financed by the Dutch government (NWO 184.021.007). Statistical analyses for the ERF study were carried out on the Genetic Cluster Computer (http://www.geneticcluster.org), which is financially supported by the Netherlands Scientific Organization (NWO 480-05-003 PI: Posthuma) along with a supplement from the Dutch Brain Foundation and the VU University Amsterdam. We are grateful to all study participants and their relatives, general practitioners and neurologists for their contributions and to P. Veraart for her help in genealogy, J. Vergeer for the supervision of the laboratory work and P. Snijders for his help in data collection. The FamHS is funded by a NHLBI grant 5R01HL08770003, and NIDDK grants 5R01DK06833603 and 5R01DK07568102. The Framingham Heart Study SHARe Project for GWAS scan was supported by the NHLBI Framingham Heart Study (Contract No. N01-HC-25195) and its contract with Affymetrix Inc for genotyping services (Contract No. N02-HL-6-4278). DNA isolation and biochemistry were partly supported by NHLBI HL-54776. A portion of this research utilized the Linux Cluster for Genetic Analysis (LinGA-II) funded by the Robert Dawson Evans Endowment of the Department of Medicine at the Boston University School of Medicine and Boston Medical Center. We are grateful to Han Chen for conducting the 1000G imputation. The Family Heart Study was supported by the by grants R01-HL-087700 and R01-HL-088215 from the National Heart, Lung, and Blood Institute (NHLBI). We would like to acknowledge the invaluable contributions of the families who took part in the Generation Scotland: Scottish Family Health Study, the general practitioners and Scottish School of Primary Care for their help in recruiting them, and the whole Generation Scotland team, which includes academic researchers, IT staff, laboratory technicians, statisticians and research managers. SNP genotyping was performed at the Wellcome Trust Clinical Research Facility in Edinburgh. GS:SFHS is funded by the Scottish Executive Health Department, Chief Scientist Office, grant number CZD/16/6. SNP genotyping was funded by the Medical Research Council, United Kingdom. We wish to acknowledge the services of the LifeLines Cohort Study, the contributing research centres delivering data to LifeLines and all the study participants. MESA Whites and the MESA SHARe project are conducted and supported by contracts N01-HC-95159 through N01-HC-95169 and RR-024156 from the NHLBI. Funding for MESA SHARe genotyping was provided by NHLBI Contract N02.HL.6.4278. MESA Family is conducted and supported in collaboration with MESA investigators; support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258 and R01HL071259. We thank the participants of the MESA study, the Coordinating Center, MESA investigators and study staff for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesa-nhlbi.org. Netherland Twin Register (NTR) and Netherlands Study of Depression and Anxiety (NESDA): Funding was obtained from the Netherlands Organization for Scientific Research (NWO) and MagW/ZonMW grants Middelgroot-911-09-032, Spinozapremie 56-464-14192, Geestkracht programme of the Netherlands Organization for Health Research and Development (Zon-MW, grant number 10-000-1002), Center for Medical Systems Biology (CSMB, NWO Genomics), NBIC/BioAssist/RK(2008.024), Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-NL, 184.021.007), VU Universityâs Institute for Health and Care Research (EMGO+) and Neuroscience Campus Amsterdam (NCA); the European Science Foundation (ESF, EU/QLRT-2001-01254), the European Communityâs Seventh Framework Program (FP7/2007-2013), ENGAGE (HEALTH-F4-2007-201413); the European Science Council (ERC Advanced, 230374); and the European Research Council (ERC-284167). Part of the genotyping and analyses were funded by the Genetic Association Information Network (GAIN) of the Foundation for the National Institutes of Health, Rutgers University Cell and DNA Repository (NIMH U24 MH068457-06), the Avera Institute, Sioux Falls, South Dakota (USA) and the National Institutes of Health (NIH R01 HD042157-01A1, MH081802, Grand Opportunity grants 1RC2 MH089951 and 1RC2 MH089995). PREVEND genetics is supported by the Dutch Kidney Foundation (Grant E033), the EU project grant GENECURE (FP-6 LSHM CT 2006 037697), the National Institutes of Health (grant 2R01LM010098), The Netherlands Organisation for Health Research and Development (NWO-Groot grant 175.010.2007.006, NWO VENI grant 916.761.70, ZonMw grant 90.700.441) and the Dutch Inter University Cardiology Institute Netherlands (ICIN). The PROSPER study was supported by an investigator-initiated grant obtained from Bristol-Myers Squibb. J.W.J is an Established Clinical Investigator of the Netherlands Heart Foundation (grant 2001 D 032). Genotyping was supported by the seventh framework programme of the European commission (grant 223004) and by the Netherlands Genomics Initiative (Netherlands Consortium for Healthy Aging grant 050-060-810). The Rotterdam Study is funded by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII) and the Municipality of Rotterdam. We are grateful to the study participants, the staff from the Rotterdam Study and the participating general practitioners and pharmacists. The generation and management of GWAS genotype data for the Rotterdam Study is supported by the Netherlands Organisation of Scientific Research NWO Investments (nr. 175.010.2005.011, 911-03-012). This study is funded by the Research Institute for Diseases in the Elderly (014-93-015; RIDE2), the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) project no. 050-060-810. We thank Pascal Arp, Mila Jhamai, Marijn Verkerk, Lizbeth Herrera and Marjolein Peters for their help in creating the GWAS database.Peer reviewedPublisher PD