25 research outputs found

    Improving the prediction of disease-related variants using protein three-dimensional structure

    Get PDF
    Background: Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. Non-synonymous SNPs occurring in coding regions result in single amino acid polymorphisms (SAPs) that may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performance, the quality of these predictions can be further improved by introducing new features derived from three-dimensional protein structures.Results: In this paper, we present a structure-based machine learning approach for predicting disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features derived from the protein's sequence, structure, and function. After dataset balancing, the structure-based method (SVM-3D) reaches an overall accuracy of 85%, a correlation coefficient of 0.70, and an area under the receiving operating characteristic curve (AUC) of 0.92. When compared with a similar sequence-based predictor, SVM-3D results in an increase of the overall accuracy and AUC by 3%, and correlation coefficient by 0.06. The robustness of this improvement has been tested on different datasets and in all the cases SVM-3D performs better than previously developed methods even when compared with PolyPhen2, which explicitly considers in input protein structure information.Conclusion: This work demonstrates that structural information can increase the accuracy of disease-related SAPs identification. Our results also quantify the magnitude of improvement on a large dataset. This improvement is in agreement with previously observed results, where structure information enhanced the prediction of protein stability changes upon mutation. Although the structural information contained in the Protein Data Bank is limiting the application and the performance of our structure-based method, we expect that SVM-3D will result in higher accuracy when more structural date become available. \ua9 2011 Capriotti; licensee BioMed Central Ltd

    Splice variants in apoptotic pathway

    No full text
    Elimination of superfluous or mutated somatic cells is provided by various mechanisms including apoptosis, and deregulation of apoptotic signaling pathways contributes to oncogenesis. 40 years have passed since the term “apoptosis” was introduced by Kerr et al. in 1972; among the programmed cell death, a variety of therapeutic strategies especially targeting apoptotic pathways have been investigated. Alternative precursor messenger RNA splicing, by which the process the exons of pre-mRNA are spliced in different arrangements to produce structurally and functionally distinct mRNA and proteins, is another field in progress, and it has been recognized as one of the most important mechanisms that maintains genomic and functional diversity. A variety of apoptotic genes are regulated through alternative pre-mRNA splicing as well, some of which have important functions as pro-apoptotic and anti-apoptotic factors. In this article we summarized splice variants of some of the apoptotic genes including BCL2L1, BIRC5, CFLAR, and MADD, as well as the regulatory mechanisms of alternative splicing of these genes. If the information of the apoptosis and aberrant splicing in each of malignancies is integrated, it will become possible to target proper variants for apoptosis, and the trans-elements themselves can become specific targets of cancer therapy as well. This article is part of a Special Issue entitled “Apoptosis: Four Decades Later”

    Variant predictions in congenital adrenal hyperplasia caused by mutations in CYP21A2

    Get PDF
    CYP21A2 deficiency represents 95% of congenital adrenal hyperplasia (CAH) cases, a group of genetic disorders that affect steroid biosynthesis. The genetic and functional analysis provide critical tools to elucidate complex CAH cases. One of the most accessible tools to infer the pathogenicity of new variants is in silico prediction. Here, we analyzed the performance of in silico prediction tools to categorize missense single nucleotide variants (SNVs) of CYP21A2. SNVs of CYP21A2 characterized in vitro by functional assays were selected to assess the performance of online single and meta predictors. SNVs were tested separately or in combination with the related phenotype (severe or mild CAH form). In total, 103 SNVs of CYP21A2 (90 pathogenic and 13 neutral) were used to test the performance of 13 single-predictors and four meta-predictors. All SNVs associated with the severe phenotypes were well categorized by all tools, with an accuracy of between 0.69 (PredictSNP2) and 0.97 (CADD), and Matthews’ correlation coefficient (MCC) between 0.49 (PoredicSNP2) and 0.90 (CADD). However, SNVs related to the mild phenotype had more variation, with the accuracy between 0.47 (S3Ds&GO and MAPP) and 0.88 (CADD), and MCC between 0.18 (MAPP) and 0.71 (CADD). From our analysis, we identified four predictors of CYP21A2 variant pathogenicity with good performance, CADD, ConSurf, DANN, and PolyPhen2. These results can be used for future analysis to infer the impact of uncharacterized SNVs in CYP21A2

    UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches

    Get PDF
    Motivation: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. Results: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation. Availability and implementation: Web access and file download from UniProt website at http://www.uniprot.org/uniref and ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. BLAST searches against UniRef are available at http://www.uniprot.org/blast/ Contact: [email protected]

    Novel mutation predicted to disrupt SGOL1 protein function

    Get PDF
    Cell cycle alterations are the major cause of cancers in human. The proper segregation of sister chromatids during the cell division process defines the fate of daughter cells which is efficiently maintained by various proteomic complexes and signaling cascades. Shugosin (SGOL1) is one among those proteins which are required for phosphatise 2A protein (PP2A) localization to centromeres during division. This localization actively manages the adherence of sister chromatids at the centromeric region until the checkpoint signals are received. Wide evidences of SGOL1 genomic variants have been studied for their correlation with chromosomal instability and chromatid segregation errors. Here we used computational methods to prioritize the Single Nucleotide Polymorphism’s (SNP’s) capable of disrupting the normal functionality of SGOL1 protein. L54Q, a mutation predicted as deleterious in this study was found to be located in N-terminal coiled coil domain which is effectively involved in the proper localization of PP2A to centromere. We further examined the effect of this mutation over the translational efficiency of the SGOL1 coding gene. Our analysis revealed major structural consequences of mutation over folding conformation of the 3rd exon. Further we carried molecular dynamic simulations to unravel the structural variations induced by this mutation in SGOL1 N-terminal coiled coil domain. Root mean square deviation (RMSD), root mean square fluctuation (RMSF), H-Bond scores further supported our result. The result obtained in our study will provide a landmark to future research in understanding genotype-phenotype association of damaging non-synonymous SNPs (nsSNPs) in several other centromere proteins as done in SGOL1 and will be helpful to forecast their role in chromosomal instabilities and solid tumor formation.Keywords: SGOL1; Molecular Dynamics Simulation; Gromacs; PhD-SNP; SIFT; Polyphen; MutPredThe Egyptian Journal of Medical Human Genetics (2013) 14, 149–15

    Improvement of Thermal Stability via Outer-Loop Ion Pair Interaction of Mutated T1 Lipase from Geobacillus zalihae Strain T1

    Get PDF
    Mutant D311E and K344R were constructed using site-directed mutagenesis to introduce an additional ion pair at the inter-loop and the intra-loop, respectively, to determine the effect of ion pairs on the stability of T1 lipase isolated from Geobacillus zalihae. A series of purification steps was applied, and the pure lipases of T1, D311E and K344R were obtained. The wild-type and mutant lipases were analyzed using circular dichroism. The Tm for T1 lipase, D311E lipase and K344R lipase were approximately 68.52 °C, 70.59 °C and 68.54 °C, respectively. Mutation at D311 increases the stability of T1 lipase and exhibited higher Tm as compared to the wild-type and K344R. Based on the above, D311E lipase was chosen for further study. D311E lipase was successfully crystallized using the sitting drop vapor diffusion method. The crystal was diffracted at 2.1 Å using an in-house X-ray beam and belonged to the monoclinic space group C2 with the unit cell parameters a = 117.32 Å, b = 81.16 Å and c = 100.14 Å. Structural analysis showed the existence of an additional ion pair around E311 in the structure of D311E. The additional ion pair in D311E may regulate the stability of this mutant lipase at high temperatures as predicted in silico and spectroscopically

    Atlas of the clinical genetics of human dilated cardiomyopathy

    Get PDF
    [Abstract] Aim. Numerous genes are known to cause dilated cardiomyopathy (DCM). However, until now technological limitations have hindered elucidation of the contribution of all clinically relevant disease genes to DCM phenotypes in larger cohorts. We now utilized next-generation sequencing to overcome these limitations and screened all DCM disease genes in a large cohort. Methods and results. In this multi-centre, multi-national study, we have enrolled 639 patients with sporadic or familial DCM. To all samples, we applied a standardized protocol for ultra-high coverage next-generation sequencing of 84 genes, leading to 99.1% coverage of the target region with at least 50-fold and a mean read depth of 2415. In this well characterized cohort, we find the highest number of known cardiomyopathy mutations in plakophilin-2, myosin-binding protein C-3, and desmoplakin. When we include yet unknown but predicted disease variants, we find titin, plakophilin-2, myosin-binding protein-C 3, desmoplakin, ryanodine receptor 2, desmocollin-2, desmoglein-2, and SCN5A variants among the most commonly mutated genes. The overlap between DCM, hypertrophic cardiomyopathy (HCM), and channelopathy causing mutations is considerably high. Of note, we find that >38% of patients have compound or combined mutations and 12.8% have three or even more mutations. When comparing patients recruited in the eight participating European countries we find remarkably little differences in mutation frequencies and affected genes. Conclusion. This is to our knowledge, the first study that comprehensively investigated the genetics of DCM in a large-scale cohort and across a broad gene panel of the known DCM genes. Our results underline the high analytical quality and feasibility of Next-Generation Sequencing in clinical genetic diagnostics and provide a sound database of the genetic causes of DCM.Hôpitaux de Paris; PHRC AOM0414

    Mapping genetic variations to three- dimensional protein structures to enhance variant interpretation: a proposed framework

    Get PDF
    The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods

    Path to Facilitate the Prediction of Functional Amino Acid Substitutions in Red Blood Cell Disorders – A Computational Approach

    Get PDF
    A major area of effort in current genomics is to distinguish mutations that are functionally neutral from those that contribute to disease. Single Nucleotide Polymorphisms (SNPs) are amino acid substitutions that currently account for approximately half of the known gene lesions responsible for human inherited diseases. As a result, the prediction of non-synonymous SNPs (nsSNPs) that affect protein functions and relate to disease is an important task.In this study, we performed a comprehensive analysis of deleterious SNPs at both functional and structural level in the respective genes associated with red blood cell metabolism disorders using bioinformatics tools. We analyzed the variants in Glucose-6-phosphate dehydrogenase (G6PD) and isoforms of Pyruvate Kinase (PKLR & PKM2) genes responsible for major red blood cell disorders. Deleterious nsSNPs were categorized based on empirical rule and support vector machine based methods to predict the impact on protein functions. Furthermore, we modeled mutant proteins and compared them with the native protein for evaluation of protein structure stability.We argue here that bioinformatics tools can play an important role in addressing the complexity of the underlying genetic basis of Red Blood Cell disorders. Based on our investigation, we report here the potential candidate SNPs, for future studies in human Red Blood Cell disorders. Current study also demonstrates the presence of other deleterious mutations and also endorses with in vivo experimental studies. Our approach will present the application of computational tools in understanding functional variation from the perspective of structure, expression, evolution and phenotype

    Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

    Get PDF
    Abstract To deal with the huge number of novel protein‐coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top‐ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses
    corecore