343 research outputs found

    Protein–Protein Interaction Hotspots Carved into Sequences

    Get PDF
    Protein–protein interactions, a key to almost any biological process, are mediated by molecular mechanisms that are not entirely clear. The study of these mechanisms often focuses on all residues at protein–protein interfaces. However, only a small subset of all interface residues is actually essential for recognition or binding. Commonly referred to as “hotspots,” these essential residues are defined as residues that impede protein–protein interactions if mutated. While no in silico tool identifies hotspots in unbound chains, numerous prediction methods were designed to identify all the residues in a protein that are likely to be a part of protein–protein interfaces. These methods typically identify successfully only a small fraction of all interface residues. Here, we analyzed the hypothesis that the two subsets correspond (i.e., that in silico methods may predict few residues because they preferentially predict hotspots). We demonstrate that this is indeed the case and that we can therefore predict directly from the sequence of a single protein which residues are interaction hotspots (without knowledge of the interaction partner). Our results suggested that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein–protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development. The server for prediction is available at http://www.rostlab.org/services/isis

    Sequence- and Interactome-Based Prediction of Viral Protein Hotspots Targeting Host Proteins: A Case Study for HIV Nef

    Get PDF
    Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk

    Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Get PDF
    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots

    Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are important for several cellular processes. Understanding the mechanism of protein-protein recognition and predicting the binding sites in protein-protein complexes are long standing goals in molecular and computational biology.</p> <p>Methods</p> <p>We have developed an energy based approach for identifying the binding site residues in protein–protein complexes. The binding site residues have been analyzed with sequence and structure based parameters such as binding propensity, neighboring residues in the vicinity of binding sites, conservation score and conformational switching.</p> <p>Results</p> <p>We observed that the binding propensities of amino acid residues are specific for protein-protein complexes. Further, typical dipeptides and tripeptides showed high preference for binding, which is unique to protein-protein complexes. Most of the binding site residues are highly conserved among homologous sequences. Our analysis showed that 7% of residues changed their conformations upon protein-protein complex formation and it is 9.2% and 6.6% in the binding and non-binding sites, respectively. Specifically, the residues Glu, Lys, Leu and Ser changed their conformation from coil to helix/strand and from helix to coil/strand. Leu, Ser, Thr and Val prefer to change their conformation from strand to coil/helix.</p> <p>Conclusions</p> <p>The results obtained in this study will be helpful for understanding and predicting the binding sites in protein-protein complexes.</p

    The Overlap of Small Molecule and Protein Binding Sites within Families of Protein Structures

    Get PDF
    Protein–protein interactions are challenging targets for modulation by small molecules. Here, we propose an approach that harnesses the increasing structural coverage of protein complexes to identify small molecules that may target protein interactions. Specifically, we identify ligand and protein binding sites that overlap upon alignment of homologous proteins. Of the 2,619 protein structure families observed to bind proteins, 1,028 also bind small molecules (250–1000 Da), and 197 exhibit a statistically significant (p<0.01) overlap between ligand and protein binding positions. These “bi-functional positions”, which bind both ligands and proteins, are particularly enriched in tyrosine and tryptophan residues, similar to “energetic hotspots” described previously, and are significantly less conserved than mono-functional and solvent exposed positions. Homology transfer identifies ligands whose binding sites overlap at least 20% of the protein interface for 35% of domain–domain and 45% of domain–peptide mediated interactions. The analysis recovered known small-molecule modulators of protein interactions as well as predicted new interaction targets based on the sequence similarity of ligand binding sites. We illustrate the predictive utility of the method by suggesting structural mechanisms for the effects of sanglifehrin A on HIV virion production, bepridil on the cellular entry of anthrax edema factor, and fusicoccin on vertebrate developmental pathways. The results, available at http://pibase.janelia.org, represent a comprehensive collection of structurally characterized modulators of protein interactions, and suggest that homologous structures are a useful resource for the rational design of interaction modulators

    Robust principal component analysis-based prediction of protein-protein interaction hot spots.

    Get PDF
    AbstractProteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein‐protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein‐protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre‐processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method

    ANCHOR: a web server and database for analysis of protein–protein interaction binding pockets for drug discovery

    Get PDF
    ANCHOR is a web-based tool whose aim is to facilitate the analysis of protein–protein interfaces with regard to its suitability for small molecule drug design. To this end, ANCHOR exploits the so-called anchor residues, i.e. amino acid side-chains deeply buried at protein–protein interfaces, to indicate possible druggable pockets to be targeted by small molecules. For a given protein–protein complex submitted by the user, ANCHOR calculates the change in solvent accessible surface area (ΔSASA) upon binding for each side-chain, along with an estimate of its contribution to the binding free energy. A Jmol-based tool allows the user to interactively visualize selected anchor residues in their pockets as well as the stereochemical properties of the surrounding region such as hydrogen bonding. ANCHOR includes a Protein Data Bank (PDB) wide database of pre-computed anchor residues from more than 30 000 PDB entries with at least two protein chains. The user can query according to amino acids, buried area (SASA), energy or keywords related to indication areas, e.g. oncogene or diabetes. This database provides a resource to rapidly assess protein–protein interactions for the suitability of small molecules or fragments with bioisostere anchor analogues as possible compounds for pharmaceutical intervention. ANCHOR web server and database are freely available at http://structure.pitt.edu/anchor

    Robust Principal Component Analysis-based Prediction of Protein-Protein Interaction Hot spots ( {RBHS} )

    Get PDF
    Proteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein-protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein-protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre-processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method
    corecore