Search CORE

25 research outputs found

Protein Binding Site Prediction by Combining Hidden Markov Support Vector Machine and Profile-Based Propensities

Author: Bin Liu
Bingquan Liu
Fule Liu
Xiaolong Wang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Identifying Protein-Protein Interaction Sites Using Covering Algorithm

Author: Berman
Bradford
Burgoyne
Chakrabarti
Chen
Chung
Conte
Dodge
Fariselli
Glaser
Jaynes
Jiaxing Cheng
Jie Song
Jones
Kabsch
Koike
Li
Mrowka
Ofran
Prasad
Res
Rost
Wang
Wang
Wang
Xiuquan Du
Young
Zhang
Zhang
Zhou
Zhou
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/05/2009
Field of study

Identification of protein-protein interface residues is crucial for structural biology. This paper proposes a covering algorithm for predicting protein-protein interface residues with features including protein sequence profile and residue accessible area. This method adequately utilizes the characters of a covering algorithm which have simple, lower complexity and high accuracy for high dimension data. The covering algorithm can achieve a comparable performance (69.62%, Complete dataset; 60.86%, Trim dataset with overall accuracy) to a support vector machine and maximum entropy on our dataset, a correlation coefficient (CC) of 0.2893, 58.83% specificity, 56.12% sensitivity on the Complete dataset and 0.2144 (CC), 53.34% (specificity), 65.59% (sensitivity) on the Trim dataset in identifying interface residues by 5-fold cross-validation on 61 protein chains. This result indicates that the covering algorithm is a powerful and robust protein-protein interaction site prediction method that can guide biologists to make specific experiments on proteins. Examination of the predictions in the context of the 3-dimensional structures of proteins demonstrates the effectiveness of this method

Crossref

Directory of Open Access Journals

PubMed Central

Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications

Author: A Krogh
A McCallum
A Quattoni
C Manning
C Sutton
C Sutton
Castrense Savojardo
CT Li
H Bigelow
J Lafferty
K Sato
L Wang
MH Li
P Bagos
P Baldi
P Fariselli
P Martelli
Pier Luigi Martelli
Piero Fariselli
R Durbin
Rita Casadio
S Wang
TH Dang
W Kabsch
X Xia
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows <it>regular grammar </it>rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that GRHCRFs can be useful tools for biosequence analysis applications. Availability GRHCRF software is available under GPLv3 licence at the website <url>http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova

Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins

Author: A Dubey
A Koike
A Rossi
AH Liu
AJ Bordner
AJ Bordner
AR Panchenko
AT Laurie
B Pils
B Thibert
B Wang
B Wilczynski
C Sander
C Yan
C Yan
C Zhang
CC Chang
D La
DH Morgan
F Osterberg
G Cheng
H Chen
H Deng
H Neuvirth
H Yao
H Yao
HX Zhou
I Res
I Xenarios
IM Nooren
IM Nooren
J Meiler
JL Chung
JR Bradford
JR Bradford
JW Torrance
K Henrick
KA Snyder
L Lo Conte
Lei Lin
MH Li
O Lichtarge
P Chakrabarti
Q Dong
Qiwen Dong
Qw Dong
QW Dong
S Jones
S Karlin
S Liang
SF Altschul
T Down
TJ Magliery
V Chelliah
VN Vapnik
W Kabsch
WS Valdar
WS Valdar
Xiaolong Wang
Y Kim
Y Ofran
Y Ofran
Yi Guan
Z Zhang
Publication venue: BioMed Central
Publication date: 01/05/2007
Field of study

Abstract Background Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity. Result In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall. Conclusion Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Author: A Henschel
A Koike
A Kouranov
A Porollo
A Rossi
AJ Bordner
B Wang
Bin Liu
Buzhou Tang
C Chothia
C Yan
C Yan
C-T Chen
C-W Cheng
H Chen
H Kim
H Neuvirth
H-X Zhou
HX Zhou
I Ezkurdia
I Res
I Tsochantaridis
I Tsochantaridis
J Lafferty
J Song
J Song
J-L Chung
JD Fischer
JL Chung
JR Bradford
JW Torrance
K Henrick
L Holm
L Lo Conte
L Wang
Lei Lin
LR Rabiner
M Gribskov
M Vincent
M Šikić
MH Li
N Li
NJ Burgoyne
P Fariselli
Q Dong
Qiwen Dong
S Ahmad
S Liang
S Qin
SF Altschul
SF Altschul
T Joachims
T Zhang
TH Dang
W Kabsch
WK Kim
X-w Chen
Xiaolong Wang
Xuan Wang
Y Altun
Y Liu
Y Ofran
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor

Author: Bogdan Istrate
Calem J Bendell
Michael Zhao
Paul T Cernek
Robert A Murgita
Samuel Khan
Sergiu Picioreanu
Shalon Liu
Tristan Aumentado-Armstrong
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Transient protein-protein interactions (PPIs), which underly most biological processes, are a prime target for therapeutic development. Immense progress has been made towards computational prediction of PPIs using methods such as protein docking and sequence analysis. However, docking generally requires high resolution structures of both of the binding partners and sequence analysis requires that a significant number of recurrent patterns exist for the identification of a potential binding site. Researchers have turned to machine learning to overcome some of the other methods’ restrictions by generalising interface sites with sets of descriptive features. Best practices for dataset generation, features, and learning algorithms have not yet been identified or agreed upon, and an analysis of the overall efficacy of machine learning based PPI predictors is due, in order to highlight potential areas for improvement. RESULTS: The presence of unknown interaction sites as a result of limited knowledge about protein interactions in the testing set dramatically reduces prediction accuracy. Greater accuracy in labelling the data by enforcing higher interface site rates per domain resulted in an average 44% improvement across multiple machine learning algorithms. A set of 10 biologically unrelated proteins that were consistently predicted on with high accuracy emerged through our analysis. We identify seven features with the most predictive power over multiple datasets and machine learning algorithms. Through our analysis, we created a new predictor, RAD-T, that outperforms existing non-structurally specializing machine learning protein interface predictors, with an average 59% increase in MCC score on a dataset with a high number of interactions. CONCLUSION: Current methods of evaluating machine-learning based PPI predictors tend to undervalue their performance, which may be artificially decreased by the presence of un-identified interaction sites. Changes to predictors’ training sets will be integral to the future progress of interface prediction by machine learning methods. We reveal the need for a larger test set of well studied proteins or domain-specific scoring algorithms to compensate for poor interaction site identification on proteins in general

Springer - Publisher Connector

PubMed Central

Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

Author: Amos-Binks Adam
Dehne Frank
Golshani Ashkan
Green James R
Gui Yuan
Patulea Catalin
Pitre Sylvain
Schoenrock Andrew
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict proteinprotein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale. Results: PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs. Conclusions: PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online a

CiteSeerX

Crossref

Carleton University's Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prediction of protein-protein interaction sites using an ensemble method

Author: Deng Lei
Dong Qiwen
Guan Jihong
Zhou Shuigeng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved. Results In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites. Conclusion Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.</p

Crossref

Directory of Open Access Journals

PubMed Central

IMPROVING HOMOLOGY MODELS FOR PROTEIN-LIGAND BINDING SITES

Author
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref