118 research outputs found

    Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms.

    Get PDF
    Recent analyses of human genome sequences have given rise to impressive advances in identifying non-synonymous single nucleotide polymorphisms (nsSNPs). By contrast, the annotation of nsSNPs and their links to diseases are progressing at a much slower pace. Many of the current approaches to analysing disease-associated nsSNPs use primarily sequence and evolutionary information, while structural information is relatively less exploited. In order to explore the potential of such information, we developed a structure-based approach, Bongo (Bonds ON Graph), to predict structural effects of nsSNPs. Bongo considers protein structures as residue-residue interaction networks and applies graph theoretical measures to identify the residues that are critical for maintaining structural stability by assessing the consequences on the interaction network of single point mutations. Our results show that Bongo is able to identify mutations that cause both local and global structural effects, with a remarkably low false positive rate. Application of the Bongo method to the prediction of 506 disease-associated nsSNPs resulted in a performance (positive predictive value, PPV, 78.5%) similar to that of PolyPhen (PPV, 77.2%) and PANTHER (PPV, 72.2%). As the Bongo method is solely structure-based, our results indicate that the structural changes resulting from nsSNPs are closely associated to their pathological consequences

    Predicting disease-associated substitution of a single amino acid by analyzing residue interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues.</p> <p>Results</p> <p>We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively.</p> <p>Conclusions</p> <p>The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.</p

    Linking Genotype and Phenotype of Saccharomyces cerevisiae Strains Reveals Metabolic Engineering Targets and Leads to Triterpene Hyper-Producers

    Get PDF
    Background: Metabolic engineering is an attractive approach in order to improve the microbial production of drugs. Triterpenes is a chemically diverse class of compounds and many among them are of interest from a human health perspective. A systematic experimental or computational survey of all feasible gene modifications to determine the genotype yielding the optimal triterpene production phenotype is a laborious and time-consuming process. Methodology/Principal Findings: Based on the recent genome-wide sequencing of Saccharomyces cerevisiae CEN.PK 113-7D and its phenotypic differences with the S288C strain, we implemented a strategy for the construction of a beta-amyrin production platform. The genes Erg8, Erg9 and HFA1 contained non-silent SNPs that were computationally analyzed to evaluate the changes that cause in the respective protein structures. Subsequently, Erg8, Erg9 and HFA1 were correlated with the increased levels of ergosterol and fatty acids in CEN.PK 113-7D and single, double, and triple gene over-expression strains were constructed. Conclusions: The six out of seven gene over-expression constructs had a considerable impact on both ergosterol and beta-amyrin production. In the case of beta-amyrin formation the triple over-expression construct exhibited a nearly 500% increase over the control strain making our metabolic engineering strategy the most successful design of triterpene microbial producers

    Analyzing Effects of Naturally Occurring Missense Mutations

    Get PDF
    Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well

    Global analysis of SNPs, proteins and protein-protein interactions: approaches for the prioritisation of candidate disease genes.

    Get PDF
    PhDUnderstanding the etiology of complex disease remains a challenge in biology. In recent years there has been an explosion in biological data, this study investigates machine learning and network analysis methods as tools to aid candidate disease gene prioritisation, specifically relating to hypertension and cardiovascular disease. This thesis comprises four sets of analyses: Firstly, non synonymous single nucleotide polymorphisms (nsSNPs) were analysed in terms of sequence and structure based properties using a classifier to provide a model for predicting deleterious nsSNPs. The degree of sequence conservation at the nsSNP position was found to be the single best attribute but other sequence and structural attributes in combination were also useful. Predictions for nsSNPs within Ensembl have been made publicly available. Secondly, predicting protein function for proteins with an absence of experimental data or lack of clear similarity to a sequence of known function was addressed. Protein domain attributes based on physicochemical and predicted structural characteristics of the sequence were used as input to classifiers for predicting membership of large and diverse protein superfamiles from the SCOP database. An enrichment method was investigated that involved adding domains to the training dataset that are currently absent from SCOP. This analysis resulted in improved classifier accuracy, optimised classifiers achieved 66.3% for single domain proteins and 55.6% when including domains from multi domain proteins. The domains from superfamilies with low sequence similarity, share global sequence properties enabling applications to be developed which compliment profile methods for detecting distant sequence relationships. Thirdly, a topological analysis of the human protein interactome was performed. The results were combined with functional annotation and sequence based properties to build models for predicting hypertension associated proteins. The study found that predicted hypertension related proteins are not generally associated with network hubs and do not exhibit high clustering coefficients. Despite this, they tend to be closer and better connected to other hypertension proteins on the interaction network than would be expected by chance. Classifiers that combined PPI network, amino acid sequence and functional properties produced a range of precision and recall scores according to the applied 3 weights. Finally, interactome properties of proteins implicated in cardiovascular disease and cancer were studied. The analysis quantified the influential (central) nature of each protein and defined characteristics of functional modules and pathways in which the disease proteins reside. Such proteins were found to be enriched 2 fold within proteins that are influential (p<0.05) in the interactome. Additionally, they cluster in large, complex, highly connected communities, acting as interfaces between multiple processes more often than expected. An approach to prioritising disease candidates based on this analysis was proposed. Each analyses can provide some new insights into the effort to identify novel disease related proteins for cardiovascular disease

    PRETICTIVE BIOINFORMATIC METHODS FOR ANALYZING GENES AND PROTEINS

    Get PDF
    Since large amounts of biological data are generated using various high-throughput technologies, efficient computational methods are important for understanding the biological meanings behind the complex data. Machine learning is particularly appealing for biological knowledge discovery. Tissue-specific gene expression and protein sumoylation play essential roles in the cell and are implicated in many human diseases. Protein destabilization is a common mechanism by which mutations cause human diseases. In this study, machine learning approaches were developed for predicting human tissue-specific genes, protein sumoylation sites and protein stability changes upon single amino acid substitutions. Relevant biological features were selected for input vector encoding, and machine learning algorithms, including Random Forests and Support Vector Machines, were used for classifier construction. The results suggest that the approaches give rise to more accurate predictions than previous studies and can provide valuable information for further experimental studies. Moreover, seeSUMO and MuStab web servers were developed to make the classifiers accessible to the biological research community. Structure-based methods can be used to predict the effects of amino acid substitutions on protein function and stability. The nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) located at the protein binding interface have dramatic effects on protein-protein interactions. To model the effects, the nsSNPs at the interfaces of 264 protein-protein complexes were mapped on the protein structures using homology-based methods. The results suggest that disease-causing nsSNPs tend to destabilize the electrostatic component of the binding energy and nsSNPs at conserved positions have significant effects on binding energy changes. The structure-based approach was developed to quantitatively assess the effects of amino acid substitutions on protein stability and protein-protein interaction. It was shown that the structure-based analysis could help elucidate the mechanisms by which mutations cause human genetic disorders. These new bioinformatic methods can be used to analyze some interesting genes and proteins for human genetic research and improve our understanding of their molecular mechanisms underlying human diseases

    Seeing the Results of a Mutation With a Vertex Weighted Hierarchical Graph

    Get PDF
    We represent the protein structure of scTIM with a graph-theoretic model. We construct a hierarchical graph with three layers - a top level, a midlevel and a bottom level. The top level graph is a representation of the protein in which its vertices each represent a substructure of the protein. In turn, each substructure of the protein is represented by a graph whose vertices are amino acids. Finally, each amino acid is represented as a graph where the vertices are atoms. We use this representation to model the effects of a mutation on the protein. Methods: There are 19 vertices (substructures) in the top level graph and thus there are 19 distinct graphs at the midlevel. The vertices of each of the 19 graphs at the midlevel represent amino acids. Each amino acid is represented by a graph where the vertices are atoms in the residue structure. All edges are determined by proximity in the protein\u27s 3D structure. The vertices in the bottom level are labelled by the corresponding molecular mass of the atom that it represents. We use graph-theoretic measures that incorporate vertex weights to assign graph based attributes to the amino acid graphs. The attributes of the corresponding amino acids are used as vertex weights for the substructure graphs at the midlevel. Graph-theoretic measures based on vertex weighted graphs are subsequently calculated for each of the midlevel graphs. Finally, the vertices of the top level graph are weighted with attributes of the corresponding substructure graph in the midlevel. Results: We can visualize which mutations are more influential than others by using properties such as vertex size to correspond with an increase or decrease in a graph-theoretic measure. Global graph-theoretic measures such as the number of triangles or the number of spanning trees can change as the result. Hence this method provides a way to visualize these global changes resulting from a small, seemingly inconsequential local change. Conclusions: This modelling method provides a novel approach to the visualization of protein structures and the consequences of amino acid deletions, insertions or substitutions and provides a new way to gain insight on the consequences of diseases caused by genetic mutations

    Computational and Experimental Approaches to Reveal the Effects of Single Nucleotide Polymorphisms with Respect to Disease Diagnostics

    Get PDF
    DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules

    Structural Stability of Human Protein Tyrosine Phosphatase ρ Catalytic Domain: Effect of Point Mutations

    Get PDF
    Protein tyrosine phosphatase ρ (PTPρ) belongs to the classical receptor type IIB family of protein tyrosine phosphatase, the most frequently mutated tyrosine phosphatase in human cancer. There are evidences to suggest that PTPρ may act as a tumor suppressor gene and dysregulation of Tyr phosphorylation can be observed in diverse diseases, such as diabetes, immune deficiencies and cancer. PTPρ variants in the catalytic domain have been identified in cancer tissues. These natural variants are nonsynonymous single nucleotide polymorphisms, variations of a single nucleotide occurring in the coding region and leading to amino acid substitutions. In this study we investigated the effect of amino acid substitution on the structural stability and on the activity of the membrane-proximal catalytic domain of PTPρ. We expressed and purified as soluble recombinant proteins some of the mutants of the membrane-proximal catalytic domain of PTPρ identified in colorectal cancer and in the single nucleotide polymorphisms database. The mutants show a decreased thermal and thermodynamic stability and decreased activation energy relative to phosphatase activity, when compared to wild- type. All the variants show three-state equilibrium unfolding transitions similar to that of the wild- type, with the accumulation of a folding intermediate populated at ∼4.0 M urea

    Development of novel Classical and Quantum Information Theory Based Methods for the Detection of Compensatory Mutations in MSAs

    Get PDF
    Multiple Sequenzalignments (MSAs) von homologen Proteinen sind nützliche Werkzeuge, um kompensatorische Mutationen zwischen nicht-konservierten Residuen zu charakterisieren. Die Identifizierung dieser Residuen in MSAs ist eine wichtige Aufgabe um die strukturellen Grundlagen und molekularen Mechanismen von Proteinfunktionen besser zu verstehen. Trotz der vielen Anzahl an Literatur über kompensatorische Mutationen sowie über die Sequenzkonservierungsanalyse für die Erkennung von wichtigen Residuen, haben vorherige Methoden meistens die biochemischen Eigenschaften von Aminosäuren nicht mit in Betracht gezogen, welche allerdings entscheidend für die Erkennung von kompensatorischen Mutationssignalen sein können. Jedoch werden kompensatorische Mutationssignale in MSAs oft durch das Rauschen verfälscht. Aus diesem Grund besteht ein weiteres Problem der Bioinformatik in der Trennung signifikanter Signale vom phylogenetischen Rauschen und beziehungslosen Paarsignalen. Das Ziel dieser Arbeit besteht darin Methoden zu entwickeln, welche biochemische Eigenschaften wie Ähnlichkeiten und Unähnlichkeiten von Aminosäuren in der Identifizierung von kompensatorischen Mutationen integriert und sich mit dem Rauschen auseinandersetzt. Deshalb entwickeln wir unterschiedliche Methoden basierend auf klassischer- und quantum Informationstheorie sowie multiple Testverfahren. Unsere erste Methode basiert auf der klassischen Informationstheorie. Diese Methode betrachtet hauptsächlich BLOSUM62-unähnliche Paare von Aminosäuren als ein Modell von kompensatorischen Mutationen und integriert sie in die Identifizierung von wichtigen Residuen. Um diese Methode zu ergänzen, entwickeln wir unsere zweite Methode unter Verwendung der Grundlagen von quantum Informationstheorie. Diese neue Methode unterscheidet sich von der ersten Methode durch gleichzeitige Modellierung ähnlicher und unähnlicher Signale in der kompensatorischen Mutationsanalyse. Des Weiteren, um signifikante Signale vom Rauschen zu trennen, entwickeln wir ein MSA-spezifisch statistisches Modell in Bezug auf multiple Testverfahren. Wir wenden unsere Methode für zwei menschliche Proteine an, nämlich epidermal growth factor receptor (EGFR) und glucokinase (GCK). Die Ergebnisse zeigen, dass das MSA-spezifisch statistische Modell die signifikanten Signale vom phylogenetischen Rauschen und von beziehungslosen Paarsignalen trennen kann. Nur unter Berücksichtigung BLOSUM62-unähnlicher Paare von Aminosäuren identifiziert die erste Methode erfolgreich die krankheits-assoziierten wichtigen Residuen der beiden Proteine. Im Gegensatz dazu, durch die gleichzeitige Modellierung ähnlicher und unähnlicher Signale von Aminosäurepaare ist die zweite Methode sensibler für die Identifizierung von katalytischen und allosterischen Residuen
    corecore