18 research outputs found

    Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched.</p> <p>Results</p> <p>In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in <it>Escherichia coli </it>and <it>Saccharomyces cerevisiae </it>respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology.</p> <p>Conclusions</p> <p>By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.</p

    Computational Approaches to Predict Protein Interaction

    Get PDF

    Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators

    Get PDF
    BACKGROUND: Identification of protein-protein interactions (PPIs) is essential for a better understanding of biological processes, pathways and functions. However, experimental identification of the complete set of PPIs in a cell/organism (“an interactome”) is still a difficult task. To circumvent limitations of current high-throughput experimental techniques, it is necessary to develop high-performance computational methods for predicting PPIs. RESULTS: In this article, we propose a new computational method to predict interaction between a given pair of protein sequences using features derived from known homologous PPIs. The proposed method is capable of predicting interaction between two proteins (of unknown structure) using Averaged One-Dependence Estimators (AODE) and three features calculated for the protein pair: (a) sequence similarities to a known interacting protein pair (F(Seq)), (b) statistical propensities of domain pairs observed in interacting proteins (F(Dom)) and (c) a sum of edge weights along the shortest path between homologous proteins in a PPI network (F(Net)). Feature vectors were defined to lie in a half-space of the symmetrical high-dimensional feature space to make them independent of the protein order. The predictability of the method was assessed by a 10-fold cross validation on a recently created human PPI dataset with randomly sampled negative data, and the best model achieved an Area Under the Curve of 0.79 (pAUC(0.5%) = 0.16). In addition, the AODE trained on all three features (named PSOPIA) showed better prediction performance on a separate independent data set than a recently reported homology-based method. CONCLUSIONS: Our results suggest that F(Net), a feature representing proximity in a known PPI network between two proteins that are homologous to a target protein pair, contributes to the prediction of whether the target proteins interact or not. PSOPIA will help identify novel PPIs and estimate complete PPI networks. The method proposed in this article is freely available on the web at http://mizuguchilab.org/PSOPIA

    In silico characterization and prediction of global protein–mRNA interactions in yeast

    Get PDF
    Post-transcriptional gene regulation is mediated through complex networks of protein–RNA interactions. The targets of only a few RNA binding proteins (RBPs) are known, even in the well-characterized budding yeast. In silico prediction of protein–RNA interactions is therefore useful to guide experiments and to provide insight into regulatory networks. Computational approaches have identified RBP targets based on sequence binding preferences. We investigate here to what extent RBP–RNA interactions can be predicted based on RBP and mRNA features other than sequence motifs. We analyze global relationships between gene and protein properties in general and between selected RBPs and known mRNA targets in particular. Highly translated RBPs tend to bind to shorter transcripts, and transcripts bound by the same RBP show high expression correlation across different biological conditions. Surprisingly, a given RBP preferentially binds to mRNAs that encode interaction partners for this RBP, suggesting coordinated post-transcriptional auto-regulation of protein complexes. We apply a machine-learning approach to predict specific RBP targets in yeast. Although this approach performs well for RBPs with known targets, predictions for uncharacterized RBPs remain challenging due to limiting experimental data. We also predict targets of fission yeast RBPs, indicating that the suggested framework could be applied to other species once more experimental data are available

    インタラクトームレベル ノ データセット オ モチイタ タンパクシツカン ソウゴサヨウ ヨソク ト ソノ オウヨウ

    Get PDF
    生体内のタンパク質間相互作用(PPI)の全体、タンパク質のインタラクトームを明らかにすることは、生物学的なパスウェイやタンパク質の機能を理解するために重要である。それを明らかにするために、現在の実験技術の限界を解決する形で、その相互作用を予測する計算科学的な手法がこれまでに数多く提案されてきた。筆者らは、近年、インタラクトームレベルのヒトの学習データセットを用いて、以前に開発したPSOPIAの高性能化に成功した。本研究では、新しいPSOPIAの有効性を検証するために、現在最も予測性能が高いと報告されている別の予測法との性能比較を行った。その結果、PSOPIAはより多くの信頼性の高いPPIを予測できることが示された。また、PSOPIAは、マウスやラットのPPI予測にも有効であることが示された。さらに、以上の結果を含めて、PSOPIAのさらなる高性能化や応用について議論を行うものである。Identification of protein interactome, the whole set of protein-protein interactions (PPI) in vivo, is important to understand biological pathways and functions of many proteins. Many computational methods to predict PPIs have so far been proposed in order to make up for limitations of current experimental techniques for identifying PPIs. We have recently improved the performance of our PPI prediction method, PSOPIA, using a human training dataset at the interactome-level. In this study, the new PSOPIA was compared with a method that has recently been developed and reported to have the highest performance of the currently available methods, in order to evaluate the predictability of the PSOPIA. As a result, it could predict more PPIs with high-confidence than the reported method. Also, it was shown that the PSOPIA could predict PPIs in mouse and rat. Furthermore, from these results, we discuss the further improvement of the PSOPIA and its applications

    Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology

    Get PDF
    Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets

    Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.</p> <p>Results</p> <p>In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.</p> <p>Conclusion</p> <p>Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.</p
    corecore