34,575 research outputs found
Choosing negative examples for the prediction of protein-protein interactions
The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions
Prediction of protein-protein interactions using one-class classification methods and integrating diverse data
This research addresses the problem of prediction of protein-protein interactions (PPI)
when integrating diverse kinds of biological information. This task has been commonly
viewed as a binary classification problem (whether any two proteins do or do not interact)
and several different machine learning techniques have been employed to solve this
task. However the nature of the data creates two major problems which can affect results.
These are firstly imbalanced class problems due to the number of positive examples (pairs
of proteins which really interact) being much smaller than the number of negative ones.
Secondly the selection of negative examples can be based on some unreliable assumptions
which could introduce some bias in the classification results.
Here we propose the use of one-class classification (OCC) methods to deal with the task of
prediction of PPI. OCC methods utilise examples of just one class to generate a predictive
model which consequently is independent of the kind of negative examples selected; additionally
these approaches are known to cope with imbalanced class problems. We have
designed and carried out a performance evaluation study of several OCC methods for this
task, and have found that the Parzen density estimation approach outperforms the rest. We
also undertook a comparative performance evaluation between the Parzen OCC method
and several conventional learning techniques, considering different scenarios, for example
varying the number of negative examples used for training purposes. We found that the
Parzen OCC method in general performs competitively with traditional approaches and in
many situations outperforms them. Finally we evaluated the ability of the Parzen OCC
approach to predict new potential PPI targets, and validated these results by searching for
biological evidence in the literature
PTOMSM: A modified version of Topological Overlap Measure used for predicting Protein-Protein Interaction Network
A variety of methods are developed to integrating diverse biological data to predict novel interaction relationship between proteins. However, traditional integration can only generate protein interaction pairs within existing relationships. Therefore, we propose a modified version of Topological Overlap Measure to identify not only extant direct PPIs links, but also novel protein interactions that can be indirectly inferred from various relationships between proteins. Our method is more powerful than a naïve Bayesian-network-based integration in PPI prediction, and could generate more reliable candidate PPIs. Furthermore, we examined the influence of the sizes of training and test datasets on prediction, and further demonstrated the effectiveness of PTOMSM in predicting PPI. More importantly, this method can be extended naturally to predict other types of biological networks, and may be combined with Bayesian method to further improve the prediction
Recommended from our members
Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose.
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification
- …