Search CORE

Full-text Institutional Repository of the Ruđer Bošković Institute

SLIDER: Mining correlated motifs in protein-protein interaction networks

Author: Boyen P.
Dijk A.D.J., van
Ham R.C.H.J., van
Neven F.
Publication venue
Publication date: 01/01/2009
Field of study

Abstract—Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that CMM is an NP-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the method SLIDER which uses local search with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that SLIDER outperforms existing motif-driven CMM methods and scales to large protein-protein interaction networks

Wageningen University & Research Publications

Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes

Author: Fukui Kazuhiko
Gromiha M Michael
Jayaram B
Saranya N
Selvaraj S
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein-protein interactions are important for several cellular processes. Understanding the mechanism of protein-protein recognition and predicting the binding sites in protein-protein complexes are long standing goals in molecular and computational biology. Methods We have developed an energy based approach for identifying the binding site residues in protein–protein complexes. The binding site residues have been analyzed with sequence and structure based parameters such as binding propensity, neighboring residues in the vicinity of binding sites, conservation score and conformational switching. Results We observed that the binding propensities of amino acid residues are specific for protein-protein complexes. Further, typical dipeptides and tripeptides showed high preference for binding, which is unique to protein-protein complexes. Most of the binding site residues are highly conserved among homologous sequences. Our analysis showed that 7% of residues changed their conformations upon protein-protein complex formation and it is 9.2% and 6.6% in the binding and non-binding sites, respectively. Specifically, the residues Glu, Lys, Leu and Ser changed their conformation from coil to helix/strand and from helix to coil/strand. Leu, Ser, Thr and Val prefer to change their conformation from strand to coil/helix. Conclusions The results obtained in this study will be helpful for understanding and predicting the binding sites in protein-protein complexes.</p

Springer - Publisher Connector

Protein Binding Site Prediction by Combining Hidden Markov Support Vector Machine and Profile-Based Propensities

Author: Bin Liu
Bingquan Liu
Fule Liu
Xiaolong Wang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

University of Essex Research Repository

Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites

Author: A Palmeri
AH Gandomi
B Petersen
BR Chitteti
CR Ingrell
GK Agrawal
H He
H Nakagami
HD Huang
J Gao
J Gao
JC Obenauer
JH Kim
JL Heazlewood
K Chen
KC Chou
L Breiman
LM Iakoucheva
M Hall
M Sikic
MM Aziz
N Blom
N Blom
P Han
R Kumar
S Que
SW Chang
V Neduva
X Chen
XW Chen
XW Zhao
Y Ban
Y Ke
Y Xue
Y Xue
YZ Chen
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2015
Field of study

Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common. In this paper, we compare the predictive performance of a novel classification algorithm with different encoding schemes to develop a rice-specific protein phosphorylation site predictor. Our results imply that the combination of Amino acid occurrence Frequency with Composition of K-Spaced Amino Acid Pairs (AF-CKSAAP) provides the best description of relevant sequence features that surround a phosphorylation site. A support vector machine (SVM) using AF-CKSAAP achieves the best performance in classifying rice protein phophorylation sites when compared to the other algorithms. We have used SVM with AF-CKSAAP to construct a rice-specific protein phosphorylation sites predictor, Rice-Phospho 1.0 (http://bioinformatics.fafu.edu.cn/rice-phospho1.0). We measure the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) of Rice-Phospho 1.0 to be 82.0% and 0.64, significantly higher than those measures for other predictors such as Scansite, Musite, PlantPhos and PhosphoRice. Rice-Phospho 1.0 also successfully predicted the experimentally identified phosphorylation sites in LOC-Os03g51600.1, a protein sequence which did not appear in the training dataset. In summary, Rice-phospho 1.0 outputs reliable predictions of protein phosphorylation sites in rice, and will serve as a useful tool to the community

Multidisciplinary Digital Publishing Institute

Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection

Author: Aguilar
Altschul
Anand
Atchey
Boeckmann
Bordoli
Breiman
Cai
Chou
Chou
Chou
Cover
Denis
Dunker
Fleuret
He
Herrmann
Hershko
Hicke
Hicke
Hitchcock
Jeon
Jones
Kaur
Kawashima
Kim
Kirkpatrick
Levi
Li
Liu
Liu
Liu
Ma
Matsumoto
Minghao Yin
Peng
Peng
Peng
Peng
Pickart
Pugalenthi
Radivojac
Saghatelian
Shen
Sikic
Skurichina
Tompa
Tung
Welchman
Wright
Wu
Xiangtao Li
Xiao
Xiaowei Zhao
Yu
Zheng
Zhiqiang Ma
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/11/2011
Field of study

Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request

InteractoMIX:A suite of computational tools to exploit interactomes in biological and clinical research

Author: Bonet Jaume
Fernandez-Fuentes Narcis
Fornes Oriol
Garcia-Garcia Javier
Marín-López Manuel Alejandro
Oliva Baldo
Planas-Iglesias Joan
Poglayen Daniel
Segura Joan
Publication venue
Publication date: 09/06/2016
Field of study

Virtually all the biological processes that occur inside or outside cells are mediated by protein–protein interactions (PPIs). Hence, the charting and description of the PPI network, initially in organisms, the interactome, but more recently in specific tissues, is essential to fully understand cellular processes both in health and disease. The study of PPIs is also at the heart of renewed efforts in the medical and biotechnological arena in the quest of new therapeutic targets and drugs. Here, we present a mini review of 11 computational tools and resources tools developed by us to address different aspects of PPIs: from interactome level to their atomic 3D structural details. We provided details on each specific resource, aims and purpose and compare with equivalent tools in the literature. All the tools are presented in a centralized, one-stop, web site: InteractoMIX (http://interactomix.com)

Aberystwyth Research Portal

Warwick Research Archives Portal Repository

Predicting protein-protein binding sites in membrane proteins

Author: A Elofsson
A Koike
A Liaw
AJ Bordner
AJ Bordner
AJ Bordner
AJ Bordner
Andrew J Bordner
B Wang
C Yan
D Lupo
E Krissinel
GE Tusnady
H Chen
H Neuvirth
HX Zhou
I Res
JR Bradford
L Breiman
L Feng
MA Yildirim
NJ Burgoyne
P Fariselli
R Development Core Team
R Landgraf
RC Edgar
S Hartel-Schenk
S Jones
S Jones
SA Eyers
SF Altschul
SH White
TM Bakheet
W Li
XW Chen
Y Ofran
Publication venue: BioMed Central
Publication date: 01/09/2009
Field of study

Abstract Background Many integral membrane proteins, like their non-membrane counterparts, form either transient or permanent multi-subunit complexes in order to carry out their biochemical function. Computational methods that provide structural details of these interactions are needed since, despite their importance, relatively few structures of membrane protein complexes are available. Results We present a method for predicting which residues are in protein-protein binding sites within the transmembrane regions of membrane proteins. The method uses a Random Forest classifier trained on residue type distributions and evolutionary conservation for individual surface residues, followed by spatial averaging of the residue scores. The prediction accuracy achieved for membrane proteins is comparable to that for non-membrane proteins. Also, like previous results for non-membrane proteins, the accuracy is significantly higher for residues distant from the binding site boundary. Furthermore, a predictor trained on non-membrane proteins was found to yield poor accuracy on membrane proteins, as expected from the different distribution of surface residue types between the two classes of proteins. Thus, although the same procedure can be used to predict binding sites in membrane and non-membrane proteins, separate predictors trained on each class of proteins are required. Finally, the contribution of each residue property to the overall prediction accuracy is analyzed and prediction examples are discussed. Conclusion Given a membrane protein structure and a multiple alignment of related sequences, the presented method gives a prioritized list of which surface residues participate in intramembrane protein-protein interactions. The method has potential applications in guiding the experimental verification of membrane protein interactions, structure-based drug discovery, and also in constraining the search space for computational methods, such as protein docking or threading, that predict membrane protein complex structures.</p

Springer - Publisher Connector

Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature

Author: AS Kolaskar
B Rost
DR Flower
EA Emini
G Riddick
G Walter
HR Ansari
Hua Zou
J Chen
J Huang
J Larsen
J Mintseris
J Pellequer
J Pellequer
J Ponomarenko
J Sollner
J Sun
J Wu
JM Parker
Juan Liu
JV Ponomarenko
L Breiman
M Sikić
Mark Hall
Meng Zhao
MH Van Regenmortel
MH Van Regenmortel
MJ Blythe
MJ Sweredoski
MJ Sweredoski
ND Rubinstein
ND Rubinstein
ND Rubinstein
P Jain
PA Karplus
PH Andersen
R Liu
S Liang
S Liang
S Saha
SR Comeau
W Kabsch
Wen Zhang
Xinghuo Ye
Y El-Manzalawy
Yi Xiong
ZP Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Antigen-antibody interactions are key events in immune system, which provide important clues to the immune processes and responses. In Antigen-antibody interactions, the specific sites on the antigens that are directly bound by the B-cell produced antibodies are well known as B-cell epitopes. The identification of epitopes is a hot topic in bioinformatics because of their potential use in the epitope-based drug design. Although most B-cell epitopes are discontinuous (or conformational), insufficient effort has been put into the conformational epitope prediction, and the performance of existing methods is far from satisfaction. Results In order to develop the high-accuracy model, we focus on some possible aspects concerning the prediction performance, including the impact of interior residues, different contributions of adjacent residues, and the imbalanced data which contain much more non-epitope residues than epitope residues. In order to address above issues, we take following strategies. Firstly, a concept of 'thick surface patch' instead of 'surface patch' is introduced to describe the local spatial context of each surface residue, which considers the impact of interior residue. The comparison between the thick surface patch and the surface patch shows that interior residues contribute to the recognition of epitopes. Secondly, statistical significance of the distance distribution difference between non-epitope patches and epitope patches is observed, thus an adjacent residue distance feature is presented, which reflects the unequal contributions of adjacent residues to the location of binding sites. Thirdly, a bootstrapping and voting procedure is adopted to deal with the imbalanced dataset. Based on the above ideas, we propose a new method to identify the B-cell conformational epitopes from 3D structures by combining conventional features and the proposed feature, and the random forest (RF) algorithm is used as the classification engine. The experiments show that our method can predict conformational B-cell epitopes with high accuracy. Evaluated by leave-one-out cross validation (LOOCV), our method achieves the mean AUC value of 0.633 for the benchmark bound dataset, and the mean AUC value of 0.654 for the benchmark unbound dataset. When compared with the state-of-the-art prediction models in the independent test, our method demonstrates comparable or better performance. Conclusions Our method is demonstrated to be effective for the prediction of conformational epitopes. Based on the study, we develop a tool to predict the conformational epitopes from 3D structures, available at <url>http://code.google.com/p/my-project-bpredictor/downloads/list</url>.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach

Author: Chuan Wang
Jiangning Song
Ren-Xiang Yan
Ruben Claudio Aguilar
Xiao-Feng Wang
Zhen Chen
Ziding Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Integral membrane proteins constitute 25–30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp

CiteSeerX