58 research outputs found

    A structural variation reference for medical and population genetics

    Get PDF
    Structural variants (SVs) rearrange large segments of DNA(1) and can have profound consequences in evolution and human disease(2,3). As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)(4) have become integral in the interpretation of single-nucleotide variants (SNVs)(5). However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage(6). We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings(7). This SV resource is freely distributed via the gnomAD browser(8) and will have broad utility in population genetics, disease-association studies, and diagnostic screening.Peer reviewe

    The mutational constraint spectrum quantified from variation in 141,456 humans

    Get PDF
    Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes(1). Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.Peer reviewe

    Evaluating drug targets through human loss-of-function genetic variation

    Get PDF
    Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.Peer reviewe

    Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes

    Get PDF
    Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs. Multi-nucleotide variants (MNV) are genetic variants in close proximity of each other on the same haplotype whose functional impact is difficult to predict if they reside in the same codon. Here, Wang et al. use the gnomAD dataset to assemble a catalogue of MNVs and estimate their global mutation rate.Peer reviewe

    Transcript expression-aware annotation improves rare variant interpretation

    Get PDF
    The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)(1), we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project(2) and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.Peer reviewe

    Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals

    Get PDF
    Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes. Upstream open reading frames (uORFs), located in 5' untranslated regions, are regulators of downstream protein translation. Here, Whiffin et al. use the genomes of 15,708 individuals in the Genome Aggregation Database (gnomAD) to systematically assess the deleteriousness of variants creating or disrupting uORFs.Peer reviewe

    Cross-trait analyses with migraine reveal widespread pleiotropy and suggest a vascular component to migraine headache

    Get PDF
    Background: Nearly a fifth of the world's population suffer from migraine headache, yet risk factors for this disease are poorly characterized. Methods: To further elucidate these factors, we conducted a genetic correlation analysis using cross-trait linkage disequilibrium (LD) score regression between migraine headache and 47 traits from the UK Biobank. We then tested for possible causality between these phenotypes and migraine, using Mendelian randomization. In addition, we attempted replication of our findings in an independent genome-wide association study (GWAS) when available. Results: We report multiple phenotypes with genetic correlation (P < 1.06 × 10-3) with migraine, including heart disease, type 2 diabetes, lipid levels, blood pressure, autoimmune and psychiatric phenotypes. In particular, we find evidence that blood pressure directly contributes to migraine and explains a previously suggested causal relationship between calcium and migraine. Conclusions: This is the largest genetic correlation analysis of migraine headache to date, both in terms of migraine GWAS sample size and the number of phenotypes tested. We find that migraine has a shared genetic basis with a large number of traits, indicating pervasive pleiotropy at migraine-associated loci.Peer reviewe

    Common Variant Burden Contributes to the Familial Aggregation of Migraine in 1,589 Families

    Get PDF
    Complex traits, including migraine, often aggregate in families, but the underlying genetic architecture behind this is not well understood. The aggregation could be explained by rare, penetrant variants that segregate according to Mendelian inheritance or by the sufficient polygenic accumulation of common variants, each with an individually small effect, or a combination of the two hypotheses. In 8,319 individuals across 1,589 migraine families, we calculated migraine polygenic risk scores (PRS) and found a significantly higher common variant burden in familial cases (n = 5,317, OR = 1.76, 95% CI = 1.71-1.81, p = 1.7 × 10-109) compared to population cases from the FINRISK cohort (n = 1,101, OR = 1.32, 95% CI = 1.25-1.38, p = 7.2 × 10-17). The PRS explained 1.6% of the phenotypic variance in the population cases and 3.5% in the familial cases (including 2.9% for migraine without aura, 5.5% for migraine with typical aura, and 8.2% for hemiplegic migraine). The results demonstrate a significant contribution of common polygenic variation to the familial aggregation of migraine

    HMG-CoA reductase is a potential therapeutic target for migraine:a mendelian randomization study

    Get PDF
    Statins are thought to have positive effects on migraine but existing data are inconclusive. We aimed to evaluate the causal effect of such drugs on migraines using Mendelian randomization. We used four types of genetic instruments as proxies for HMG-CoA reductase inhibition. We included the expression quantitative trait loci of the HMG-CoA reductase gene and genetic variation within or near the HMG-CoA reductase gene region. Variants were associated with low-density lipoprotein cholesterol, apolipoprotein B, and total cholesterol. Genome-wide association study summary data for the three lipids were obtained from the UK Biobank. Comparable data for migraine were obtained from the International Headache Genetic Consortium and the FinnGen Consortium. Inverse variance weighting method was used for the primary analysis. Additional analyses included pleiotropic robust methods, colocalization, and meta-analysis. Genetically determined high expression of HMG-CoA reductase was associated with an increased risk of migraines (OR = 1.55, 95% CI 1.30–1.84, P = 6.87 × 10−7). Similarly, three genetically determined HMG-CoA reductase-mediated lipids were associated with an increased risk of migraine. These conclusions were consistent across meta-analyses. We found no evidence of bias caused by pleiotropy or genetic confounding factors. These findings support the hypothesis that statins can be used to treat migraine.</p
    corecore