39 research outputs found
Autism Spectrum Disorder Genetics: Diverse Genes with Diverse Clinical Outcomes
The last several years have seen unprecedented advances in deciphering the genetic etiology of autism spectrum disorders (ASDs). Heritability studies have repeatedly affirmed a contribution of genetic factors to the overall disease risk. Technical breakthroughs have enabled the search for these genetic factors via genome-wide surveys of a spectrum of potential sequence variations, from common single-nucleotide polymorphisms to essentially private chromosomal abnormalities. Studies of copy-number variation have identified significant roles for both recurrent and nonrecurrent large dosage imbalances, although they have rarely revealed the individual genes responsible. More recently, discoveries of rare point mutations and characterization of balanced chromosomal abnormalities have pinpointed individual ASD genes of relatively strong effect, including both loci with strong a priori biological relevance and those that would have otherwise been unsuspected as high-priority biological targets. Evidence has also emerged for association with many common variants, each adding a small individual contribution to ASD risk. These findings collectively provide compelling empirical data that the genetic basis of ASD is highly heterogeneous, with hundreds of genes capable of conferring varying degrees of risk, depending on their nature and the predisposing genetic alteration. Moreover, many genes that have been implicated in ASD also appear to be risk factors for related neurodevelopmental disorders, as well as for a spectrum of psychiatric phenotypes. While some ASD genes have evident functional significance, like synaptic proteins such as the SHANKs, neuroligins, and neurexins, as well as fragile x mental retardation–associated proteins, ASD genes have also been discovered that do not present a clear mechanism of specific neurodevelopmental dysfunction, such as regulators of chromatin modification and global gene expression. In their sum, the progress from genetic studies to date has been remarkable and increasingly rapid, but the interactive impact of strong-effect genetic lesions coupled with weak effect common polymorphisms has not yet led to a unified understanding of ASD pathogenesis or explained its highly variable clinical expression. With an increasingly firm genetic foundation, the coming years will hopefully see equally rapid advances in elucidating the functional consequences of ASD genes and their interactions with environmental/experiential factors, supporting the development of rational interventions
Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population
As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10-100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10-16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations
Correction:Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population
The data in the S2 Data File does not display correctly. Please view the correct S2 Data File below.</p
Analysis of protein-coding genetic variation in 60,706 humans
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes
Analysis of protein-coding genetic variation in 60,706 humans
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. We describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of truncating variants with 72% having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes
Correction: Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population
Correction of 2018 article: PLoS Genet 14(5): e1007329. https://doi.org/10.1371/journal.pgen.1007329 PMID: 2979557
Analysis of protein-coding genetic variation in 60,706 humans
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.Peer reviewe
The mutational constraint spectrum quantified from variation in 141,456 humans
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases