Search CORE

25 research outputs found

Author Correction: Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes.

Author: Alföldi J
Cummings BB
Francioli LC
Gauthier LD
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Hill AJ
Karczewski KJ
MacArthur DG
O'Donnell-Luria AH
Pierce-Hoffman E
Wang Q
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2021
Field of study

Spiral - Imperial College Digital Repository

Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes.

Author: Alföldi J
Cummings BB
Francioli LC
Gauthier LD
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Hill AJ
Karczewski KJ
MacArthur DG
O'Donnell-Luria AH
Pierce-Hoffman E
Wang Q
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/09/2019
Field of study

Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs

Spiral - Imperial College Digital Repository

The effect of LRRK2 loss-of-function variants in humans

Author: 23andMe Research Team
Alföldi Jessica
Alipanahi Babak
Armean Irina M.
Banks Eric
Baptista Marco A.S.
Bergelson Louis
Cibulskis Kristian
Cole Joanne B.
Collins Ryan L.
Connolly Kristen M.
Covarrubias Miguel
Cummings Beryl
Daly Mark J.
Donnelly Stacey
Farjoun Yossi
Ferriera Steven
Francioli Laurent
Gabriel Stacey
Gauthier Laura D.
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Gentry Jeff
Goodrich Julia K.
Guan Anna
Gupta Namrata
Jeandet Thibault
Kaplan Diane
Karczewski Konrad J.
Kleinman Aaron
Laricchia Kristen M.
Lehtimäki Terho
Llanwarne Christopher
Marshall Jamie L.
Mattila Kari M.
Merchant Kalpana M.
Minikel Eric V.
Morrison Peter
Munshi Ruchi
Neale Benjamin M.
Novod Sam
O’Donnell-Luria Anne H.
Petrillo Nikelle
Quaife Nicholas M.
Suvisaari Jaana
Wang Qingbo
Whiffin Nicola
Publication venue
Publication date: 01/01/2020
Field of study

Analysis of large genomic datasets, including gnomAD, reveals that partial LRRK2 loss of function is not strongly associated with diseases, serving as an example of how human genetics can be leveraged for target validation in drug discovery. Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes(1,2). Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease(3,4), suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns(5-8), the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)(9), 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work(10), confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.Peer reviewe

Lund University Publications

Julkari

Spiral - Imperial College Digital Repository

Helsingin yliopiston digitaalinen arkisto

University of Dundee Online Publications

Trepo - Institutional Repository of Tampere University

A structural variation reference for medical and population genetics

Author: Collins Ryan
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Talkowski Michael E
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/05/2020
Field of study

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening

Lund University Publications

Transcript expression-aware annotation improves rare variant interpretation

Author: Cummings Beryl B.
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
MacArthur Daniel G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/05/2020
Field of study

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the ‘proportion expressed across transcripts’, which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies

Lund University Publications

The mutational constraint spectrum quantified from variation in 141,456 humans

Author: Genome Aggregation Database Consortium
Karczewski Konrad J.
MacArthur Daniel G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/05/2020
Field of study

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases

Lund University Publications

Transcript expression-aware annotation improves rare variant interpretation.

Author: Alföldi J
Cummings BB
Daly MJ
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Karczewski KJ
Karjalainen J
Kosmicki JA
MacArthur DG
Mudge JM
O'Donnell-Luria AH
Poterba T
Satterstrom FK
Seaby EG
Seed C
Singer-Berk M
Solomonson M
Watts NA
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/04/2020
Field of study

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies

Spiral - Imperial College Digital Repository

Transcript expression-aware annotation improves rare variant interpretation

Author: Cummings Beryl B.
Genome Aggregation Database Consortium
Karczewski Konrad J.
Kosmicki Jack A.
MacArthur Daniel G.
Seaby Eleanor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/05/2020
Field of study

Southampton (e-Prints Soton)

Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes

Author: Alföldi Jessica
Cummings Beryl B.
Francioli Laurent C.
Gauthier Laura D.
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Hill Andrew J.
Karczewski Konrad J.
Lehtimäki Terho
MacArthur Daniel G.
Mattila Kari M.
O’Donnell-Luria Anne H.
Pierce-Hoffman Emma
Wang Qingbo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Trepo - Institutional Repository of Tampere University

Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals

Author: Alföldi J
Barton PJR
Chothani S
Cook SA
Evans DG
Francioli LC
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Karczewski KJ
MacArthur DG
O'Donnell-Luria AH
Quaife NM
Rackham O
Roberts AM
Schafer S
Smith MJ
Ware JS
Whiffin N
Zhang X
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/05/2019
Field of study

Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes

Spiral - Imperial College Digital Repository