1,448 research outputs found

    Prioritization of candidate cancer genes—an aid to oncogenomic studies

    Get PDF
    The development of techniques for oncogenomic analyses such as array comparative genomic hybridization, messenger RNA expression arrays and mutational screens have come to the fore in modern cancer research. Studies utilizing these techniques are able to highlight panels of genes that are altered in cancer. However, these candidate cancer genes must then be scrutinized to reveal whether they contribute to oncogenesis or are coincidental and non-causative. We present a computational method for the prioritization of candidate (i) proto-oncogenes and (ii) tumour suppressor genes from oncogenomic experiments. We constructed computational classifiers using different combinations of sequence and functional data including sequence conservation, protein domains and interactions, and regulatory data. We found that these classifiers are able to distinguish between known cancer genes and other human genes. Furthermore, the classifiers also discriminate candidate cancer genes from a recent mutational screen from other human genes. We provide a web-based facility through which cancer biologists may access our results and we propose computational cancer gene classification as a useful method of prioritizing candidate cancer genes identified in oncogenomic studies

    Updated benchmarking of variant effect predictors using deep mutational scanning

    Get PDF
    Abstract The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and ESM‐1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking

    Characterising and Predicting Haploinsufficiency in the Human Genome

    Get PDF
    Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies

    Mendelian gene identification through mouse embryo viability screening.

    Get PDF
    BACKGROUND: The diagnostic rate of Mendelian disorders in sequencing studies continues to increase, along with the pace of novel disease gene discovery. However, variant interpretation in novel genes not currently associated with disease is particularly challenging and strategies combining gene functional evidence with approaches that evaluate the phenotypic similarities between patients and model organisms have proven successful. A full spectrum of intolerance to loss-of-function variation has been previously described, providing evidence that gene essentiality should not be considered as a simple and fixed binary property. METHODS: Here we further dissected this spectrum by assessing the embryonic stage at which homozygous loss-of-function results in lethality in mice from the International Mouse Phenotyping Consortium, classifying the set of lethal genes into one of three windows of lethality: early, mid, or late gestation lethal. We studied the correlation between these windows of lethality and various gene features including expression across development, paralogy and constraint metrics together with human disease phenotypes. We explored a gene similarity approach for novel gene discovery and investigated unsolved cases from the 100,000 Genomes Project. RESULTS: We found that genes in the early gestation lethal category have distinct characteristics and are enriched for genes linked with recessive forms of inherited metabolic disease. We identified several genes sharing multiple features with known biallelic forms of inborn errors of the metabolism and found signs of enrichment of biallelic predicted pathogenic variants among early gestation lethal genes in patients recruited under this disease category. We highlight two novel gene candidates with phenotypic overlap between the patients and the mouse knockouts. CONCLUSIONS: Information on the developmental period at which embryonic lethality occurs in the knockout mouse may be used for novel disease gene discovery that helps to prioritise variants in unsolved rare disease cases

    Investigating Genetic Causes of Mendelian Congenital Myopathies

    Get PDF
    This thesis investigates the genetic aetiology of congenital myopathy in families with an unresolved genetic diagnosis. In two families, massively parallel sequencing and functional analyses identified two genetic candidates: a regulatory variant (c.*152G>T) and multi-exon deletion in a known disease gene (KLHL40), and a homozygous missense variant (c.1339T>C) in HMGCS1, a novel disease gene. This work supports the further investigation of regulatory variants for congenital myopathy screening and highlights the mevalonate pathway in muscle function

    Evaluation of DNA Methylation Episignatures for Diagnosis and Phenotype Correlations in 42 Mendelian Neurodevelopmental Disorders.

    Get PDF
    Genetic syndromes frequently present with overlapping clinical features and inconclusive or ambiguous genetic findings which can confound accurate diagnosis and clinical management. An expanding number of genetic syndromes have been shown to have unique genomic DNA methylation patterns (called episignatures ). Peripheral blood episignatures can be used for diagnostic testing as well as for the interpretation of ambiguous genetic test results. We present here an approach to episignature mapping in 42 genetic syndromes, which has allowed the identification of 34 robust disease-specific episignatures. We examine emerging patterns of overlap, as well as similarities and hierarchical relationships across these episignatures, to highlight their key features as they are related to genetic heterogeneity, dosage effect, unaffected carrier status, and incomplete penetrance. We demonstrate the necessity of multiclass modeling for accurate genetic variant classification and show how disease classification using a single episignature at a time can sometimes lead to classification errors in closely related episignatures. We demonstrate the utility of this tool in resolving ambiguous clinical cases and identification of previously undiagnosed cases through mass screening of a large cohort of subjects with developmental delays and congenital anomalies. This study more than doubles the number of published syndromes with DNA methylation episignatures and, most significantly, opens new avenues for accurate diagnosis and clinical assessment in individuals affected by these disorders

    Contribution of a novel "B3GLCT" variant to peters plus syndrome discovered by a combination of next-generation sequencing and automated text mining

    Get PDF
    Anterior segment dysgenesis (ASD) encompasses a spectrum of ocular disorders affecting the structures of the anterior eye chamber. Mutations in several genes, involved in eye development, are implicated in this disorder. ASD is often accompanied by diverse multisystemic symptoms and another genetic cause, such as variants in genes encoding collagen type IV. Thus, a wide spectrum of phenotypes and underlying genetic diversity make fast and proper diagnosis challenging. Here, we used AMELIE, an automatic text mining tool that enriches data with the most up-to-date information from literature, and wANNOVAR, which is based on well-documented databases and incorporates variant filtering strategy to identify genetic variants responsible for severely-manifested ASD in a newborn child. This strategy, applied to trio sequencing data in compliance with ACMG 2015 guidelines, helped us find two compound heterozygous variants of the B3GLCT gene, of which c.660+1G>A (rs80338851) was previously associated with the phenotype of Peters plus syndrome (PPS), while the second, NM_194318.3:c.755delC (p.T252fs), in exon 9 of the same gene was noted for the first time. PPS, a very rare subtype of ASD, is a glycosylation disorder, where the dysfunctional B3GLCT gene product, O-fucose-specific β-1,3-glucosyltransferase, is ineffective in providing a noncanonical quality control system for proper protein folding in cells. Our study expands the mutation spectrum of the B3GLCT gene related to PPS. We suggest that the implementation of automatic text mining tools in combination with careful variant filtering could help translate sequencing results into diagnosis, thus, considerably accelerating the diagnostic process and, thereby, improving patient management
    corecore