206 research outputs found

    Robust Principal Component Analysis-based Prediction of Protein-Protein Interaction Hot spots ( {RBHS} )

    Get PDF
    Proteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein-protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein-protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre-processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method

    Robust principal component analysis-based prediction of protein-protein interaction hot spots.

    Get PDF
    AbstractProteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein‐protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein‐protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre‐processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method

    dbMPIKT: A database of kinetic and thermodynamic mutant protein interactions

    Full text link
    © 2018 The Author(s). Background: Protein-protein interactions (PPIs) play important roles in biological functions. Studies of the effects of mutants on protein interactions can provide further understanding of PPIs. Currently, many databases collect experimental mutants to assess protein interactions, but most of these databases are old and have not been updated for several years. Results: To address this issue, we manually curated a kinetic and thermodynamic database of mutant protein interactions (dbMPIKT) that is freely accessible at our website. This database contains 5291 mutants in protein interactions collected from previous databases and the literature published within the last three years. Furthermore, some data analysis, such as mutation number, mutation type, protein pair source and network map construction, can be performed online. Conclusion: Our work can promote the study on PPIs, and novel information can be mined from the new database. Our database is available in http://DeepLearner.ahu.edu.cn/web/dbMPIKT/for use by all, including both academics and non-academics

    Using machine-learning-driven approaches to boost hot-spot's knowledge

    Get PDF
    Understanding protein–protein interactions (PPIs) is fundamental to describe and to characterize the formation of biomolecular assemblies, and to establish the energetic principles underlying biological networks. One key aspect of these interfaces is the existence and prevalence of hot-spots (HS) residues that, upon mutation to alanine, negatively impact the formation of such protein–protein complexes. HS have been widely considered in research, both in case studies and in a few large-scale predictive approaches. This review aims to present the current knowledge on PPIs, providing a detailed understanding of the microspecifications of the residues involved in those interactions and the characteristics of those defined as HS through a thorough assessment of related field-specific methodologies. We explore recent accurate artificial intelligence-based techniques, which are progressively replacing well-established classical energy-based methodologies. This article is categorized under: Data Science > Databases and Expert Systems Structure and Mechanism > Computational Biochemistry and Biophysics Molecular and Statistical Mechanics > Molecular Interactions

    Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

    Get PDF
    Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

    Large margin methods for partner specific prediction of interfaces in protein complexes

    Get PDF
    2014 Spring.The study of protein interfaces and binding sites is a very important domain of research in bioinformatics. Information about the interfaces between proteins can be used not only in understanding protein function but can also be directly employed in drug design and protein engineering. However, the experimental determination of protein interfaces is cumbersome, expensive and not possible in some cases with today's technology. As a consequence, the computational prediction of protein interfaces from sequence and structure has emerged as a very active research area. A number of machine learning based techniques have been proposed for the solution to this problem. However, the prediction accuracy of most such schemes is very low. In this dissertation we present large-margin classification approaches that have been designed to directly model different aspects of protein complex formation as well as the characteristics of available data. Most existing machine learning techniques for this task are partner-independent in nature, i.e., they ignore the fact that the binding propensity of a protein to bind to another protein is dependent upon characteristics of residues in both proteins. We have developed a pairwise support vector machine classifier called PAIRpred to predict protein interfaces in a partner-specific fashion. Due to its more detailed model of the problem, PAIRpred offers state of the art accuracy in predicting both binding sites at the protein level as well as inter-protein residue contacts at the complex level. PAIRpred uses sequence and structure conservation, local structural similarity and surface geometry, residue solvent exposure and template based features derived from the unbound structures of proteins forming a protein complex. We have investigated the impact of explicitly modeling the inter-dependencies between residues that are imposed by the overall structure of a protein during the formation of a protein complex through transductive and semi-supervised learning models. We also present a novel multiple instance learning scheme called MI-1 that explicitly models imprecision in sequence-level annotations of binding sites in proteins that bind calmodulin to achieve state of the art prediction accuracy for this task

    Prediction of protein-protein interface region based on structural information

    Get PDF
    Orientador: Clesio Luiz TozziTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia EletricaResumo: Este trabalho aborda o problema de predição de regiões de interface proteína-proteína, baseado em medidas de propriedades físico-químicas e estruturais. A abordagem considera os aminoácidos da superfície da proteína como as unidades básicas para classificação, o que elimina a restrição de uso de patches, considerada por preditores do mesmo tipo. Este preditor é complementado por um segundo preditor, que identifica, dentre os aminoácidos da interface, os mais relevantes do ponto de vista de energia de ligação proteína-proteína, conhecidos como hot spots. A abordagem apresentada permite a utilização dos classificadores de forma independente na predição dos aminoácidos de interface e dos hot spots. Diferente de outras abordagens encontradas na literatura para predição de hot spots, a abordagem empregada não depende do conhecimento da estrutura do complexo protéico e permite que a predição de hot spots seja realizada em complemento à predição dos aminoácidos da interface. O desempenho alcançado pelo preditor na identificação de hot spots é superior ao obtido por outros preditores descritos na literatura e que utilizam o mesmo conjunto de dados e critério de avaliação de desempenho.Abstract: This work approaches the problem of protein-proteins interface region prediction based on measures of physical-chemical and structural properties. The approach considers the amino acids of the protein surface as the basic units for classification, such that the restriction of use of patches, considered by similar predictors, is eliminated. This predictor is complemented by a second one, which identifies, among the interface amino acids, those which are most relevant from the standpoint of protein-protein binding energy. The classifiers can be used independently, for predicting interface amino acids and hot spots. Unlike other approaches for prediction of hot spots, described in the literature, the proposed approach does not depend on the knowledge of the protein structure in complex. This allows predicting hot spots in complement to the prediction of the interface amino acids. Concerning the identification of hot spots, the proposed predictor outperformed those, described in the literature, which use the same data set and criterion for performance evaluation.DoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric
    corecore