Search CORE

24,397 research outputs found

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Author: González Alvaro J
Liao Li
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to <it>InterPreTS </it>(Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at <url>http://liao.cis.udel.edu/pub/svdsvm</url>. Implemented in Matlab and supported on Linux and MS Windows.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting binding sites of hydrolase-inhibitor complexes by combining several methods

Author: Cao Haibo
Dobbs Drena
Dobbs Drena
Gu Xun
Ho Kai-Ming
Honovar Vasant
Ihm Yungok
Jernigan Robert
Jernigan Robert
Kloczkowski Andrzej
Sen Taner
Wang Cai-Zhuang
Yan Changhui
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2004
Field of study

Background Protein-protein interactions play a critical role in protein function. Completion of many genomes is being followed rapidly by major efforts to identify interacting protein pairs experimentally in order to decipher the networks of interacting, coordinated-in-action proteins. Identification of protein-protein interaction sites and detection of specific amino acids that contribute to the specificity and the strength of protein interactions is an important problem with broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Results In order to increase the power of predictive methods for protein-protein interaction sites, we have developed a consensus methodology for combining four different methods. These approaches include: data mining using Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich. Results obtained on a dataset of hydrolase-inhibitor complexes demonstrate that the combination of all four methods yield improved predictions over the individual methods. Conclusions We developed a consensus method for predicting protein-protein interface residues by combining sequence and structure-based methods. The success of our consensus approach suggests that similar methodologies can be developed to improve prediction accuracies for other bioinformatic problems

Digital Repository @ Iowa State University (ISU)

PubMed Central

Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

Author: Armean Irina M
Holden Sean B
Lilley Kathryn S
Pilkington Nicholas CV
Trotter Matthew WB
Publication venue: Bioinformatics
Publication date: 30/01/2018
Field of study

MOTIVATION: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. RESULTS: PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. AVAILABILITY AND IMPLEMENTATION: https://github.com/ima23/maxent-ppi. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Crossref

Apollo (Cambridge)

A model to predict and analyze protein-protein interaction types using electrostatic energies

Author: Vasudev Gokul
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2012
Field of study

Prediction and analysis of types of protein-protein interactions (PPI) is an important problem in molecular biology because of its key role in many biological processes in living cells. In this thesis, I propose a model called PPIEE (Protein-protein interaction using electrostatic energies) to predict and analyze protein interaction types using electrostatic energies as properties to distinguish between these types of interactions. This prediction approach uses electrostatic energies for pairs of atoms and amino acids present in interfaces where the interaction occurs. Using this approach, the results on well-known datasets confirms that electrostatic energy is an important property to predict obligate and non-obligate protein interaction types. The classifiers used are support vector machines and linear dimensionality reduction. Since electrostatic interactions are long ranged, some other experiments are performed by changing the threshold values, which are the distances calculated between atom pairs of interacting chains, ranging from 7Å to 13Å. This information will be helpful for researchers to understand how different physiochemical properties contribute to understanding about stability of protein complexes and their function

Scholarship at UWindsor

BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features

Author: AP Bradley
AR Panchenko
C Yan
Caiyan Huang
CH Wu
DE Draper
E Bechara
IB Kuznetsov
JA Swets
Jack Y Yang
JC Darnell
L Wang
L Wang
Liangjiang Wang
M Terribilini
Mary Qu Yang
P Baldi
S Ahmad
S Ahmad
S Hwang
S Jones
SF Altschul
T Joachims
WS Noble
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Understanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences. Results In the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction. Conclusions Predictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ (http://bioinfo.ggc.org/bindn+/) to make the SVM classifiers accessible to the research community

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Protein-protein interaction based on pairwise similarity

Author: A Szilàgyi
A Tong
AC Gavin
B Schwikowski
B Schölkopf
C Xue-Wen
D Thomas
E Sprinzak
F Pazos
G Rigaut
H Rangwala
H Saigo
J Wojcik
L Liao
M Deng
M Edward
N Cristianini
Nazar Zaki
NM Zaki
NM Zaki
NM Zaki
NM Zaki
NM Zaki
P Matteo
P Sylvain
Piers Campbell
PL Bartel
S Juwen
S Ramazan
Sanja Lazarova-Molnar
T Pawson
T Smith
TW Huang
VN Vapnik
Wassim El-Hajj
WR Pearson
Z Heng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein-protein interaction (PPI) is essential to most biological processes. Abnormal interactions may have implications in a number of neurological syndromes. Given that the association and dissociation of protein molecules is crucial, computational tools capable of effectively identifying PPI are desirable. In this paper, we propose a simple yet effective method to detect PPI based on pairwise similarity and using only the primary structure of the protein. The PPI based on Pairwise Similarity (PPI-PS) method consists of a representation of each protein sequence by a vector of pairwise similarities against large subsequences of amino acids created by a shifting window which passes over concatenated protein training sequences. Each coordinate of this vector is typically the E-value of the Smith-Waterman score. These vectors are then used to compute the kernel matrix which will be exploited in conjunction with support vector machines. Results To assess the ability of the proposed method to recognize the difference between "<it>interacted</it>" and "<it>non-interacted</it>" proteins pairs, we applied it on different datasets from the available yeast <it>saccharomyces cerevisiae </it>protein interaction. The proposed method achieved reasonable improvement over the existing state-of-the-art methods for PPI prediction. Conclusion Pairwise similarity score provides a relevant measure of similarity between protein sequences. This similarity incorporates biological knowledge about proteins and it is extremely powerful when combined with support vector machine to predict PPI.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Southern Denmark Research Output

Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder

Author: Sezerman Ugur
Sezerman Uğur
Yavuz Ahmet Sinan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. Results In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. Conclusions By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction

Crossref

Springer - Publisher Connector

PubMed Central

Sabanci University Research Database

Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

Author: Zhong Wei
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2006
Field of study

Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied

CiteSeerX

ScholarWorks @ Georgia State University

Predicting protein-protein interactions as a one-class classification problem

Author: Alashwal Hany
Deris Safaai
Othman Razib M.
Publication venue
Publication date: 01/01/2006
Field of study

Protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been used to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. While it is easy to get a dataset of interacting protein as positive example, there is no experimentally confirmed non-interacting protein to be considered as a negative set. Therefore, in this paper we solve this problem as a one-class classification problem using One-Class SVM (OCSVM). Using only positive examples (interacting protein pairs) for training, the OCSVM achieves accuracy of 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with reliable accuracy

Universiti Teknologi Malaysia Institutional Repository

Prediction of protein-protein interactions using one-class classification methods and integrating diverse data

Author: Gilbert D
Reyes J A
Publication venue: JIB
Publication date: 01/01/2007
Field of study

This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results. Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature

CiteSeerX

Directory of Open Access Journals

Brunel University Research Archive