1,157 research outputs found

    Utility of gene-specific algorithms for predicting pathogenicity of uncertain gene variants

    Get PDF
    ManuscriptThe rapid advance of gene sequencing technologies has produced an unprecedented rate of discovery for genome variation in humans. A growing numbered of authoritative clinical repositories archive gene variants and disease phenotype, yet there are currently many more gene variants that lack clear annotation or disease association. To date, there has been very limited coverage of gene-specific predictors in the literature. Here we present the evaluation of ?gene-specific? predictor models based on a Na?ve Bayesian classifier for 20 gene-disease data sets, containing 3,986 variants with clinically characterized patient conditions. Utility of gene-specific prediction is then compared ?all-gene? generalized prediction and also to existing popular predictors. Gene-specific computational prediction models derived from clinically curated gene variant disease data sets often outperform established generalized algorithms for novel and uncertain gene variants

    Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity

    Get PDF
    BACKGROUND: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation. METHODS: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes. RESULTS: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions. CONCLUSIONS: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making

    Doctor of Philosophy

    Get PDF
    dissertationRapidly evolving technologies such as chip arrays and next-generation sequencing are uncovering human genetic variants at an unprecedented pace. Unfortunately, this ever growing collection of gene sequence variation has limited clinical utility without clear association to disease outcomes. As electronic medical records begin to incorporate genetic information, gene variant classification and accurate interpretation of gene test results plays a critical role in customizing patient therapy. To verify the functional impact of a given gene variant, laboratories rely on confirming evidence such as previous literature reports, patient history and disease segregation in a family. By definition variants of uncertain significance (VUS) lack this supporting evidence and in such cases, computational tools are often used to evaluate the predicted functional impact of a gene mutation. This study evaluates leveraging high quality genotype-phenotype disease variant data from 20 genes and 3986 variants, to develop gene-specific predictors utilizing a combination of changes in primary amino acid sequence, amino acid properties as descriptors of mutation severity and Naïve Bayes classification. A Primary Sequence Amino Acid Properties (PSAAP) prediction algorithm was then combined with well established predictors in a weighted Consensus sum in context of gene-specific reference intervals for known phenotypes. PSAAP and Consensus were also used to evaluate known variants of uncertain significance in the RET proto-oncogene as a model gene. The PSAAP algorithm was successfully extended to many genes and diseases. Gene-specific algorithms typically outperform generalized prediction tools. Characteristic mutation properties of a given gene and disease may be lost when diluted into genomewide data sets. A reliable computational phenotype classification framework with quantitative metrics and disease specific reference ranges allows objective evaluation of novel or uncertain gene variants and augments decision making when confirming clinical information is limited

    Statistical methods for clinical genome interpretation with specific application to inherited cardiac conditions

    Get PDF
    Background: While next-generation sequencing has enabled us to rapidly identify sequence variants, clinical application is limited by our ability to determine which rare variants impact disease risk. Aim: Developing computational methods to identify clinically important variants Methods and Results: (1) I built a disease-specific variant classifier for inherited cardiac conditions (ICCs), which outperforms genome-wide tools in a wide range of benchmarking. It discriminates pathogenic variants from benign variants with global accuracy improved by 4-24% over existing tools. Variants classified with >90% confidence are significantly associated with both disease status and clinical outcomes. (2) To better interpret missense variants, I examined evolutionarily equivalent residues across protein domain families, to identify positions intolerant of variations. Homologous residue constraint is a strong predictor of variant pathogenicity. It can identify a subset of de novo missense variants with comparable impact on developmental disorders as protein-truncating variants. Independent from existing approaches, it can also improve the prioritisation of disease-relevant gene for both developmental disorders and inherited hypertrophic cardiomyopathy. (3) TTN-truncating variants are known to cause dilated cardiomyopathy, but the effect of missense variants is poorly understood. Using the approach in (2), I studied the role of TTN missense variants on DCM. Our prioritised residues are enriched with known pathogenic variants, including the two known to cause DCM and others involved in skeletal myopathies. I also found a significant association between constrained variants of TTN I-set domains and DCM in a case-control burden test of Caucasian samples (OR=3.2, 95%CI=1.3-9.4). Within subsets of DCM, the association is replicated in alcoholic cardiomyopathy. (4) Finally, I also developed a tool to annotate 5’UTR variants creating or disrupting upstream open reading frames (uORF). Its utility is demonstrated to detect high-impact uORF-disturbing variants from ClinVar, gnomAD and Genomics England. Conclusion: These studies established broadly applicable methods and improved understanding of ICCs.Open Acces

    Consensus: a framework for evaluation of uncertain gene variants in laboratory test reporting

    Get PDF
    Accurate interpretation of gene testing is a key component in customizing patient therapy. Where confirming evidence for a gene variant is lacking, computational prediction may be employed. A standardized framework, however, does not yet exist for quantitative evaluation of disease association for uncertain or novel gene variants in an objective manner. Here, complementary predictors for missense gene variants were incorporated into a weighted Consensus framework that includes calculated reference intervals from known disease outcomes. Data visualization for clinical reporting is also discussed

    Computational prediction of protein subdomain stability in MYBPC3 enables clinical risk stratification in hypertrophic cardiomyopathy and enhances variant interpretation

    Get PDF
    PURPOSE: Variants in MYBPC3 causing loss of function are the most common cause of hypertrophic cardiomyopathy (HCM). However, a substantial number of patients carry missense variants of uncertain significance (VUS) in MYBPC3. We hypothesize that a structural-based algorithm, STRUM, which estimates the effect of missense variants on protein folding, will identify a subgroup of HCM patients with a MYBPC3 VUS associated with increased clinical risk. METHODS: Among 7,963 patients in the multicenter Sarcomeric Human Cardiomyopathy Registry (SHaRe), 120 unique missense VUS in MYBPC3 were identified. Variants were evaluated for their effect on subdomain folding and a stratified time-to-event analysis for an overall composite endpoint (first occurrence of ventricular arrhythmia, heart failure, all-cause mortality, atrial fibrillation, and stroke) was performed for patients with HCM and a MYBPC3 missense VUS. RESULTS: We demonstrated that patients carrying a MYBPC3 VUS predicted to cause subdomain misfolding (STRUM+, ΔΔG ≤ −1.2 kcal/mol) exhibited a higher rate of adverse events compared with those with a STRUM- VUS (hazard ratio = 2.29, P = 0.0282). In silico saturation mutagenesis of MYBPC3 identified 4,943/23,427 (21%) missense variants that were predicted to cause subdomain misfolding. CONCLUSION: STRUM identifies patients with HCM and a MYBPC3 VUS who may be at higher clinical risk and provides supportive evidence for pathogenicity

    Challenges for the implementation of next generation sequencing-based expanded carrier screening: Lessons learned from the ciliopathies

    Full text link
    Next generation sequencing (NGS) can detect carrier status for rare recessive disorders, informing couples about their reproductive risk. The recent ACMG recommendations support offering NGS-based carrier screening (NGS-CS) in an ethnic and population-neutral manner for all genes that have a carrier frequency >1/200 (based on GnomAD). To evaluate current challenges for NGS-CS, we focused on the ciliopathies, a well-studied group of rare recessive disorders. We analyzed 118 ciliopathy genes by whole exome sequencing in ~400 healthy local individuals and ~1000 individuals from the UK1958-birth cohort. We found 20% of healthy individuals (1% of couples) to be carriers of reportable variants in a ciliopathy gene, while 50% (4% of couples) carry variants of uncertain significance (VUS). This large proportion of VUS is partly explained by the limited utility of the ACMG/AMP variant-interpretation criteria in healthy individuals, where phenotypic match or segregation criteria cannot be used. Most missense variants are thus classified as VUS and not reported, which reduces the negative predictive value of the screening test. We show how gene-specific variation patterns and structural protein information can help prioritize variants most likely to be disease-causing, for (future) functional assays. Even when considering only strictly pathogenic variants, the observed carrier frequency is substantially higher than expected based on estimated disease prevalence, challenging the 1/200 carrier frequency cut-off proposed for choice of genes to screen. Given the challenges linked to variant interpretation in healthy individuals and the uncertainties about true carrier frequencies, genetic counseling must clearly disclose these limitations of NGS-CS

    Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases

    Get PDF
    BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25–30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing
    • …
    corecore