40 research outputs found

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or \u27scaffold\u27) of haplotypes across each chromosome. We then phase the sequence data \u27onto\u27 this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved

    Patterns and rates of exonic de novo mutations in autism spectrum disorders

    Get PDF
    Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified1,2. To identify further genetic risk factors, we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n= 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant and the overall rate of mutation is only modestly higher than the expected rate. In contrast, there is significantly enriched connectivity among the proteins encoded by genes harboring de novo missense or nonsense mutations, and excess connectivity to prior ASD genes of major effect, suggesting a subset of observed events are relevant to ASD risk. The small increase in rate of de novo events, when taken together with the connections among the proteins themselves and to ASD, are consistent with an important but limited role for de novo point mutations, similar to that documented for de novo copy number variants. Genetic models incorporating these data suggest that the majority of observed de novo events are unconnected to ASD, those that do confer risk are distributed across many genes and are incompletely penetrant (i.e., not necessarily causal). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5 to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favor of CHD8 and KATNAL2 as genuine autism risk factors

    Analysis of Rare, Exonic Variation amongst Subjects with Autism Spectrum Disorders and Population Controls

    Get PDF
    We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD. © 2013 Liu et al

    Analysis of protein-coding genetic variation in 60,706 humans

    Get PDF
    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. We describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of truncating variants with 72% having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes

    The genetic architecture of type 2 diabetes

    Get PDF
    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes

    Analysis of protein-coding genetic variation in 60,706 humans

    Get PDF
    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.Peer reviewe

    A global reference for human genetic variation

    No full text
    The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.Wellcome Trust (London, England) (Core Award 090532/Z/09/Z)Wellcome Trust (London, England) (Senior Investigator Award 095552/Z/11/Z )Wellcome Trust (London, England) (WT095908)Wellcome Trust (London, England) (WT109497)Wellcome Trust (London, England) (WT098051)Wellcome Trust (London, England) (WT086084/Z/08/Z)Wellcome Trust (London, England) (WT100956/Z/13/Z )Wellcome Trust (London, England) (WT097307)Wellcome Trust (London, England) (WT0855322/Z/08/Z )Wellcome Trust (London, England) (WT090770/Z/09/Z )Wellcome Trust (London, England) (Major Overseas program in Vietnam grant 089276/Z.09/Z)Medical Research Council (Great Britain) (grant G0801823)Biotechnology and Biological Sciences Research Council (Great Britain) (grant BB/I02593X/1)Biotechnology and Biological Sciences Research Council (Great Britain) (grant BB/I021213/1)Zhongguo ke xue ji shu qing bao yan jiu suo. Office of 863 Programme of China (2012AA02A201)National Basic Research Program of China (2011CB809201)National Basic Research Program of China (2011CB809202)National Basic Research Program of China (2011CB809203)National Natural Science Foundation of China (31161130357)Shenzhen Municipal Government of China (grant ZYC201105170397A)Canadian Institutes of Health Research (grant 136855)Quebec Ministry of Economic Development, Innovation, and Exports (PSR-SIIRI-195)Germany. Bundesministerium für Bildung und Forschung (0315428A)Germany. Bundesministerium für Bildung und Forschung (01GS08201)Germany. Bundesministerium für Bildung und Forschung (BMBF-EPITREAT grant 0316190A)Deutsche Forschungsgemeinschaft (Emmy Noether Grant KO4037/1-1)Beatriu de Pinos Program (2006 BP-A 10144)Beatriu de Pinos Program (2009 BP-B 00274)Spanish National Institute for Health (grant PRB2 IPT13/0001-ISCIII-SGEFI/FEDER)Japan Society for the Promotion of Science (fellowship number PE13075)Marie Curie Actions Career Integration (grant 303772)Fonds National Suisse del la Recherche, SNSF, Scientifique (31003A_130342)National Center for Biotechnology Information (U.S.) (U54HG3067)National Center for Biotechnology Information (U.S.) (U54HG3273)National Center for Biotechnology Information (U.S.) (U01HG5211)National Center for Biotechnology Information (U.S.) (U54HG3079)National Center for Biotechnology Information (U.S.) (R01HG2898)National Center for Biotechnology Information (U.S.) (R01HG2385)National Center for Biotechnology Information (U.S.) (RC2HG5552)National Center for Biotechnology Information (U.S.) (U01HG6513)National Center for Biotechnology Information (U.S.) (U01HG5214)National Center for Biotechnology Information (U.S.) (U01HG5715)National Center for Biotechnology Information (U.S.) (U01HG5718)National Center for Biotechnology Information (U.S.) (U01HG5728)National Center for Biotechnology Information (U.S.) (U41HG7635)National Center for Biotechnology Information (U.S.) (U41HG7497)National Center for Biotechnology Information (U.S.) (R01HG4960)National Center for Biotechnology Information (U.S.) (R01HG5701)National Center for Biotechnology Information (U.S.) (R01HG5214)National Center for Biotechnology Information (U.S.) (R01HG6855)National Center for Biotechnology Information (U.S.) (R01HG7068)National Center for Biotechnology Information (U.S.) (R01HG7644)National Center for Biotechnology Information (U.S.) (DP2OD6514)National Center for Biotechnology Information (U.S.) (DP5OD9154)National Center for Biotechnology Information (U.S.) (R01CA166661)National Center for Biotechnology Information (U.S.) (R01CA172652)National Center for Biotechnology Information (U.S.) (P01GM99568)National Center for Biotechnology Information (U.S.) (R01GM59290)National Center for Biotechnology Information (U.S.) (R01GM104390)National Center for Biotechnology Information (U.S.) (T32GM7790)National Center for Biotechnology Information (U.S.) (P01GM99568)National Center for Biotechnology Information (U.S.) (R01HL87699)National Center for Biotechnology Information (U.S.) (R01HL104608)National Center for Biotechnology Information (U.S.) (T32HL94284)National Center for Biotechnology Information (U.S.) (HHSN268201100040C)National Center for Biotechnology Information (U.S.) (HHSN272201000025C)Lundbeck Foundation (grant R170-2014-1039Simons Foundation (SFARI award SF51)National Science Foundation (U.S.) (Research Fellowship DGE-1147470

    Identification and Functional Characterization of G6PC2 Coding Variants Influencing Glycemic Traits Define an Effector Transcript at the G6PC2-ABCB11 Locus

    Get PDF
    corecore