23 research outputs found

    Predicting protein-protein interface residues using local surface structural similarity

    <p>Abstract</p> <p>Background</p> <p>Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce <it>PrISE</it>, a family of local structural similarity-based computational methods for predicting protein-protein interface residues.</p> <p>Results</p> <p>We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The <it>PrISE </it>family of interface prediction methods uses a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the <it>PrISE </it>methods identifies for each structural element in the query protein, a collection of <it>similar </it>structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. <it>PrISE<sub>L </sub></it>relies on the similarity between structural elements (i.e. local structural similarity). <it>PrISE<sub>G </sub></it>relies on the similarity between protein surfaces (i.e. general structural similarity). <it>PrISE<sub>C</sub></it>, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the <it>PrISE<sub>C </sub></it>outperforms <it>PrISE<sub>L </sub></it>and <it>PrISE<sub>G</sub></it>; and that <it>PrISE<sub>C </sub></it>is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of <it>PrISE<sub>C </sub></it>with <it>PredUs</it>, a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of <it>PredUs </it>can be obtained using only local surface structural similarity. <it>PrISE<sub>C </sub></it>is available as a Web server at <url>http://prise.cs.iastate.edu/</url></p> <p>Conclusions</p> <p>Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.</p

    Global and local structural similarity in protein–protein complexes: Implications for template-based docking

    The increasing amount of structural information on protein–protein interactions makes it possible to predict the structure of protein–protein complexes by comparison/alignment of the interacting proteins to the ones in cocrystallized complexes. In the predictions based on structure similarity, the template search is performed by structural alignment of the target interactors with the entire structures or with the interface only of the subunits in cocrystallized complexes. This study investigates the scope of the structural similarity that facilitates the detection of a broad range of templates significantly divergent from the targets. The analysis of the target-template similarity is based on models of protein–protein complexes in a large representative set of heterodimers. The similarity of the biological and crystal packing interfaces, dissimilar interface structural motifs in overall similar structures, interface similarity to the full structure, and local similarity away from the interface were analyzed. The structural similarity at the protein–protein interfaces only was observed in ~25% of target-template pairs with sequence identity <20% and primarily homodimeric templates. For ~50% of the target-template pairs, the similarity at the interface was accompanied by the similarity of the whole structure. However, the structural similarity at the interfaces was still stronger than that of the noninterface parts. The study provides insights into structural and functional diversity of protein–protein complexes, and relative performance of the interface and full structure alignment in docking

    Domain-mediated interactions for protein subfamily identification

    Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.11Ysciescopu

    Structure-based prediction of protein-protein interaction sites

    Protein-protein interactions play a central role in the formation of protein complexes and the biological pathways that orchestrate virtually all cellular processes. Reliable identification of the specific amino acid residues that form the interface of a protein with one or more other proteins is critical to understanding the structural and physico-chemical basis of protein interactions and their role in key cellular processes, predicting protein complexes, validating protein interactions predicted by high throughput methods, and identifying and prioritizing drug targets in computational drug design. Because of the difficulty and the high cost of experimental characterization of interface residues, there is an urgent need for computational methods for reliable predicting protein-protein interface residues from the sequence, and when available, the structure of a query protein, and when known, its putative interacting partner. Against this background, this thesis develops improved methods for predicting protein-protein interface residues and protein-protein interfaces from the three dimensional structure of an unbound query protein without considering information of its binding protein partner. Towards this end, we develop (i) ProtInDb (http://protindb.cs.iastate.edu), a database of protein-protein interface residues to facilitate (a) the generation of datasets of protein-protein interface residues that can be used to perform analysis of interaction sites and to train and evaluate predictors of interface residues, and (b) the visualization of interaction sites between proteins in both the amino acid sequences and the 3D protein structures, among other applications; (ii) PoInterS (http://pointers.cs.iastate.edu/), a method for predicting protein-protein interaction sites formed by spatially contiguous clusters of interface residues based on the predictions generated by a protein interface residue predictor. PoInterS divides a protein surface into a series of patches composed of several surface residues, and uses the outputs of the interface residue predictors to rank and select a small set of patches that are the most likely to constitute the interaction sites; and (iii) PrISE (http://prise.cs.iastate.edu/), a method for predicting protein-protein interface residues based on the similarity of the structural element formed by the query residue and its neighboring residues and the structural elements extracted from the interface and non-interface regions of proteins that are members of experimentally determined protein complexes. A structural element captures the atomic composition and solvent accessibility of a central residue and its closest neighbors in the protein structure. PrISE decomposes a query protein into a set of structural elements and searches for similar elements in a large set of proteins that belong to one or more experimentally determined complexes. The structural elements that are most similar to each structural element extracted from the query protein are then used to infer whether its central residue is or is not an interface residue. The results of our experiments using a variety of benchmark datasets show that PoInterS and PrISE generally outperform the state-of-the-art structure-based methods for predicting interaction patches and interface residues, respectively

    Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor

    BACKGROUND: Transient protein-protein interactions (PPIs), which underly most biological processes, are a prime target for therapeutic development. Immense progress has been made towards computational prediction of PPIs using methods such as protein docking and sequence analysis. However, docking generally requires high resolution structures of both of the binding partners and sequence analysis requires that a significant number of recurrent patterns exist for the identification of a potential binding site. Researchers have turned to machine learning to overcome some of the other methods’ restrictions by generalising interface sites with sets of descriptive features. Best practices for dataset generation, features, and learning algorithms have not yet been identified or agreed upon, and an analysis of the overall efficacy of machine learning based PPI predictors is due, in order to highlight potential areas for improvement. RESULTS: The presence of unknown interaction sites as a result of limited knowledge about protein interactions in the testing set dramatically reduces prediction accuracy. Greater accuracy in labelling the data by enforcing higher interface site rates per domain resulted in an average 44% improvement across multiple machine learning algorithms. A set of 10 biologically unrelated proteins that were consistently predicted on with high accuracy emerged through our analysis. We identify seven features with the most predictive power over multiple datasets and machine learning algorithms. Through our analysis, we created a new predictor, RAD-T, that outperforms existing non-structurally specializing machine learning protein interface predictors, with an average 59% increase in MCC score on a dataset with a high number of interactions. CONCLUSION: Current methods of evaluating machine-learning based PPI predictors tend to undervalue their performance, which may be artificially decreased by the presence of un-identified interaction sites. Changes to predictors’ training sets will be integral to the future progress of interface prediction by machine learning methods. We reveal the need for a larger test set of well studied proteins or domain-specific scoring algorithms to compensate for poor interaction site identification on proteins in general

    Hierarchical representation for PPI sites prediction

    Background: Protein–protein interactions have pivotal roles in life processes, and aberrant interactions are associated with various disorders. Interaction site identification is key for understanding disease mechanisms and design new drugs. Effective and efficient computational methods for the PPI prediction are of great value due to the overall cost of experimental methods. Promising results have been obtained using machine learning methods and deep learning techniques, but their effectiveness depends on protein representation and feature selection. Results: We define a new abstraction of the protein structure, called hierarchical representations, considering and quantifying spatial and sequential neighboring among amino acids. We also investigate the effect of molecular abstractions using the Graph Convolutional Networks technique to classify amino acids as interface and no-interface ones. Our study takes into account three abstractions, hierarchical representations, contact map, and the residue sequence, and considers the eight functional classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0. The performance of our method, evaluated using standard metrics, is compared to the ones obtained with some state-of-the-art protein interface predictors. The analysis of the performance values shows that our method outperforms the considered competitors when the considered molecules are structurally similar. Conclusions: The hierarchical representation can capture the structural properties that promote the interactions and can be used to represent proteins with unknown structures by codifying only their sequential neighboring. Analyzing the results, we conclude that classes should be arranged according to their architectures rather than functions

    Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

    Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class

    RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins

    Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/

    Structure-based Prediction of Protein-protein Interaction Networks across Proteomes

    Protein-protein interactions (PPIs) orchestrate virtually all cellular processes, therefore, their exhaustive exploration is essential for the comprehensive understanding of cellular networks. Significant efforts have been devoted to expand the coverage of the proteome-wide interaction space at molecular level. A number of experimental techniques have been developed to discover PPIs, however these approaches have some limitations such as the high costs and long times of experiments, noisy data sets, and often high false positive rate and inter-study discrepancies. Given experimental limitations, computational methods are increasingly becoming important for detection and structural characterization of PPIs. In that regard, we have developed a novel pipeline for high-throughput PPI prediction based on all-to-all rigid body docking of protein structures. We focus on two questions, ‘how do proteins interact?’ and ‘which proteins interact?’. The method combines molecular modeling, structural bioinformatics, machine learning, and functional annotation data to answer these questions and it can be used for genome-wide molecular reconstruction of protein-protein interaction networks. As a proof of concept, 61,913 protein-protein interactions were confidently predicted and modeled for the proteome of E. coli. Further, we validated our method against a few human pathways. The modeling protocol described in this communication can be applied to detect protein-protein interactions in other organisms as well as to construct dimer structures and estimate the confidence of protein interactions experimentally identified with high-throughput techniques