109 research outputs found
Recommended from our members
Modeling Rare Protein-Coding Variation to Identify Mutation-Intolerant Genes With Application to Disease
Sequencing exomesâthe 1% of the genome that codes for proteinsâhas increased the rate at which the genetic basis of a patientâs disease is determined. Unfortunately, when a patient does not carry a well-established pathogenic variant, it is extremely challenging to establish which of the tens of thousands of variants identified in that individual is contributing to their disease. In these situations, variants must be prioritized to make further investigation more manageable. In this thesis, we have focused on creating statistical frameworks and models to aid in the interpretation of rare variants and towards establishing gene-level metrics for variant prioritization.
We developed a sensitive and specific workflow to detect newly arising (de novo) variants from exome sequencing data of parent-child trios, and created a sequence-context based mutational. This mutational model was the basis of a rigorous statistical framework to evaluate the significance of de novo variant burden not only globally, but also per gene. When we applied this framework to de novo variants identified in patients with an autism spectrum disorder, we found a global excess of de novo loss-of-function variants as well as two genes that harbored significantly more de novo loss-of-function variants than expected.
We also used the mutational model to predict the expected number of rare (minor allele frequency < 0.1%) variants in exome sequencing datasets of reference individuals. We found a significant depletion of missense and loss-of-function variants in a subset of genes, indicating that these genes are under strong evolutionary constraint. Specifically, we identified 3,230 genes that are intolerant of loss-of-function variation and that set of genes is enriched for established dominant and haploinsufficient disease genes. Similarly, we searched for regions within genes that were intolerant of missense variation. The most missense depleted 15% of the exome contains 83% of reported pathogenic variants found in haploinsufficient disease genes that cause severe disease. Additionally, both gene-level and region-level constraint metrics highlight a set of de novo variants from patients with a neurodevelopmental disorder that are more likely to be pathogenic, supporting the utility of these metrics when interpreting rare variants within the context of disease.Medical Science
Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery
: Background: One of the major hurdles in clinical genetics is interpreting the clinical consequences associated with germline missense variants in humans. Recent significant advances have leveraged natural variation observed in large-scale human populations to uncover genes or genomic regions that show a depletion of natural variation, indicative of selection pressure. We refer to this as âgenetic constraintâ. Although existing genetic constraint metrics have been demonstrated to be successful in prioritising genes or genomic regions associated with diseases, their spatial resolution is limited in distinguishing pathogenic variants from benign variants within genes. Methods: We aim to identify missense variants that are significantly depleted in the general human population. Given the size of currently available human populations with exome or genome sequencing data, it is not possible to directly detect depletion of individual missense variants, since the average expected number of observations of a variant at most positions is less than one. We instead focus on protein domains, grouping homologous variants with similar functional impacts to examine the depletion of natural variations within these comparable sets. To accomplish this, we develop the Homologous Missense Constraint (HMC) score. We utilise the Genome Aggregation Database (gnomAD) 125 K exome sequencing data and evaluate genetic constraint at quasi amino-acid resolution by combining signals across protein homologues. Results: We identify one million possible missense variants under strong negative selection within protein domains. Though our approach annotates only protein domains, it nonetheless allows us to assess 22% of the exome confidently. It precisely distinguishes pathogenic variants from benign variants for both early-onset and adult-onset disorders. It outperforms existing constraint metrics and pathogenicity meta-predictors in prioritising de novo mutations from probands with developmental disorders (DD). It is also methodologically independent of these, adding power to predict variant pathogenicity when used in combination. We demonstrate utility for gene discovery by identifying seven genes newly significantly associated with DD that could act through an altered-function mechanism. Conclusions: Grouping variants of comparable functional impacts is effective in evaluating their genetic constraint. HMC is a novel and accurate predictor of missense consequence for improved variant interpretation
Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides
Variation in RNA splicing (i.e., alternative splicing) plays an important role in many diseases. Variants near 5' and 3' splice sites often affect splicing, but the effects of these variants on splicing and disease have not been fully characterized beyond the two "essential" splice nucleotides flanking each exon. Here we provide quantitative measurements of tolerance to mutational disruptions by position and reference allele-alternative allele combinations. We show that certain reference alleles are particularly sensitive to mutations, regardless of the alternative alleles into which they are mutated. Using public RNA-seq data, we demonstrate that individuals carrying such variants have significantly lower levels of the correctly spliced transcript, compared to individuals without them, and confirm that these specific substitutions are highly enriched for known Mendelian mutations. Our results propose a more refined definition of the "splice region" and offer a new way to prioritize and provide functional interpretation of variants identified in diagnostic sequencing and association studies.Peer reviewe
Genetic risk factors have a substantial impact on healthy life years
The impact of genetic variation on overall disease burden has not been comprehensively evaluated. We introduce an approach to estimate the effect of genetic risk factors on disability-adjusted life years (DALYs; 'lost healthy life years'). We use genetic information from 735,748 individuals and consider 80 diseases. Rare variants had the highest effect on DALYs at the individual level. Among common variants, rs3798220 (LPA) had the strongest individual-level effect, with 1.18 DALYs from carrying 1 versus 0 copies. Being in the top 10% versus the bottom 90% of a polygenic score for multisite chronic pain had an effect of 3.63 DALYs. Some common variants had a population-level effect comparable to modifiable risk factors such as high sodium intake and low physical activity. Attributable DALYs vary between males and females for some genetic exposures. Genetic risk factors can explain a sizable number of healthy life years lost both at the individual and population level.Peer reviewe
Reduced reproductive success is associated with selective constraint on human genes
Genome-wide sequencing of human populations has revealed substantial variation among genes in the intensity of purifying selection acting on damaging genetic variants1. Although genes under the strongest selective constraint are highly enriched for associations with Mendelian disorders, most of these genes are not associated with disease and therefore the nature of the selection acting on them is not known2. Here we show that genetic variants that damage these genes are associated with markedly reduced reproductive success, primarily owing to increased childlessness, with a stronger effect in males than in females. We present evidence that increased childlessness is probably mediated by genetically associated cognitive and behavioural traits, which may mean that male carriers are less likely to find reproductive partners. This reduction in reproductive success may account for 20% of purifying selection against heterozygous variants that ablate protein-coding genes. Although this genetic association may only account for a very minor fraction of the overall likelihood of being childless (less than 1%), especially when compared to more influential sociodemographic factors, it may influence how genes evolve over time
Contribution of retrotransposition to developmental disorders.
Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies
The contribution of X-linked coding variation to severe developmental disorders
Over 130 X-linked genes have been robustly associated with developmental disorders, and X-linked causes have been hypothesised to underlie the higher developmental disorder rates in males. Here, we evaluate the burden of X-linked coding variation in 11,044 developmental disorder patients, and find a similar rate of X-linked causes in males and females (6.0% and 6.9%, respectively), indicating that such variants do not account for the 1.4-fold male bias. We develop an improved strategy to detect X-linked developmental disorders and identify 23 significant genes, all of which were previously known, consistent with our inference that the vast majority of the X-linked burden is in known developmental disorder-associated genes. Importantly, we estimate that, in male probands, only 13% of inherited rare missense variants in known developmental disorder-associated genes are likely to be pathogenic. Our results demonstrate that statistical analysis of large datasets can refine our understanding of modes of inheritance for individual X-linked disorders. Developmental disorders (DDs) are more prevalent in males, thought to be due to X-linked genetic variation. Here, the authors investigate the burden of X-linked coding variants in 11,044 DD patients, showing that this contributes to similar to 6% of both male and female cases and therefore does not solely explain male bias in DDs.Peer reviewe
A framework for the detection of de novo mutations in family-based sequencing data
Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports
- âŠ