Search CORE

27 research outputs found

KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions

Author: Bader
Bhasin
Bock
Browne
Garg
Garian
Gribskov
Grigoriev
Hermjakob
Hopp
J. C. Jeong
Jansen
Jansen
Jones
Jones
Jones
Jones
Lord
P. Dermyer
von Mering
X.-w. Chen
Publication venue: Oxford University Press
Publication date: 15/04/2014
Field of study

KUPS (The University of Kansas Proteomics Service) provides high-quality protein–protein interaction (PPI) data for researchers developing and evaluating computational models for predicting PPIs by allowing users to construct ready-to-use data sets of interacting protein pairs (IPPs), non-interacting protein pairs (NIPs) and associated features. Multiple filters and options allow the user to control the make-up of the IPPs and NIPs as well as the quality of the resultant data sets. Each data set is built from the overall database, which includes 185 446 IPPs and ∼1.5 billion NIPs from five primary databases: IntAct, HPRD, MINT, UniProt and the Gene Ontology. The IPP set can be set to specific model organisms, interaction types and experimental evidence. The NIP set can be generated using four different strategies, which can alleviate biased estimation problems. Lastly, multiple features can be provided for all of the IPP and NIP pairs. Additionally, KUPS provides two benchmark data sets to help researchers compare their algorithms to existing approaches. KUPS is freely available at http://www.ittc.ku.edu/chenlab

Crossref

KU ScholarWorks

PubMed Central

Application of Machine Learning Techniques for Real-time Classification of Sensor Array Data

Author: Li Sichu
Publication venue: ScholarWorks@UNO
Publication date: 15/05/2009
Field of study

There is a significant need to identify approaches for classifying chemical sensor array data with high success rates that would enhance sensor detection capabilities. The present study attempts to fill this need by investigating six machine learning methods to classify a dataset collected using a chemical sensor array: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes Classifier (NB), and Principal Component Regression (PCR). A total of 10 predictors that are associated with the response from 10 sensor channels are used to train and test the classifiers. A training dataset of 4 classes containing 136 samples is used to build the classifiers, and a dataset of 4 classes with 56 samples is used for testing. The results generated with the six different methods are compared and discussed. The RF, CART, and KNN are found to have success rates greater than 90%, and to outperform the other methods

Application of Machine Learning Techniques for Real-time Classification of Sensor Array Data

Author: Li Sichu
Publication venue: ScholarWorks@UNO
Publication date: 15/05/2009
Field of study

University of New Orleans

Predicting protein-protein binding sites in membrane proteins

Author: A Elofsson
A Koike
A Liaw
AJ Bordner
AJ Bordner
AJ Bordner
AJ Bordner
Andrew J Bordner
B Wang
C Yan
D Lupo
E Krissinel
GE Tusnady
H Chen
H Neuvirth
HX Zhou
I Res
JR Bradford
L Breiman
L Feng
MA Yildirim
NJ Burgoyne
P Fariselli
R Development Core Team
R Landgraf
RC Edgar
S Hartel-Schenk
S Jones
S Jones
SA Eyers
SF Altschul
SH White
TM Bakheet
W Li
XW Chen
Y Ofran
Publication venue: BioMed Central
Publication date: 01/09/2009
Field of study

Abstract Background Many integral membrane proteins, like their non-membrane counterparts, form either transient or permanent multi-subunit complexes in order to carry out their biochemical function. Computational methods that provide structural details of these interactions are needed since, despite their importance, relatively few structures of membrane protein complexes are available. Results We present a method for predicting which residues are in protein-protein binding sites within the transmembrane regions of membrane proteins. The method uses a Random Forest classifier trained on residue type distributions and evolutionary conservation for individual surface residues, followed by spatial averaging of the residue scores. The prediction accuracy achieved for membrane proteins is comparable to that for non-membrane proteins. Also, like previous results for non-membrane proteins, the accuracy is significantly higher for residues distant from the binding site boundary. Furthermore, a predictor trained on non-membrane proteins was found to yield poor accuracy on membrane proteins, as expected from the different distribution of surface residue types between the two classes of proteins. Thus, although the same procedure can be used to predict binding sites in membrane and non-membrane proteins, separate predictors trained on each class of proteins are required. Finally, the contribution of each residue property to the overall prediction accuracy is analyzed and prediction examples are discussed. Conclusion Given a membrane protein structure and a multiple alignment of related sequences, the presented method gives a prioritized list of which surface residues participate in intramembrane protein-protein interactions. The method has potential applications in guiding the experimental verification of membrane protein interactions, structure-based drug discovery, and also in constraining the search space for computational methods, such as protein docking or threading, that predict membrane protein complex structures.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Knowledge-guided inference of domain–domain interactions from incomplete protein–protein interaction networks

Author: Chen Xue-wen
Jothi Raja
Liu Mei
Publication venue: Oxford University Press
Publication date: 16/04/2014
Field of study

Motivation: Protein-protein interactions (PPIs), though extremely valuable towards a better understanding of protein functions and cellular processes, do not provide any direct information about the regions/domains within the proteins that mediate the interaction. Most often, it is only a fraction of a protein that directly interacts with its biological partners. Thus, understanding interaction at the domain level is a critical step towards (i) thorough understanding of PPI networks; (ii) precise identification of binding sites; (iii) acquisition of insights into the causes of deleterious mutations at interaction sites; and (iv) most importantly, development of drugs to inhibit pathological protein interactions. In addition, knowledge derived from known domain–domain interactions (DDIs) can be used to understand binding interfaces, which in turn can help discover unknown PPIs

KU ScholarWorks

PubMed Central

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Author: A Henschel
A Koike
A Kouranov
A Porollo
A Rossi
AJ Bordner
B Wang
Bin Liu
Buzhou Tang
C Chothia
C Yan
C Yan
C-T Chen
C-W Cheng
H Chen
H Kim
H Neuvirth
H-X Zhou
HX Zhou
I Ezkurdia
I Res
I Tsochantaridis
I Tsochantaridis
J Lafferty
J Song
J Song
J-L Chung
JD Fischer
JL Chung
JR Bradford
JW Torrance
K Henrick
L Holm
L Lo Conte
L Wang
Lei Lin
LR Rabiner
M Gribskov
M Vincent
M Šikić
MH Li
N Li
NJ Burgoyne
P Fariselli
Q Dong
Qiwen Dong
S Ahmad
S Liang
S Qin
SF Altschul
SF Altschul
T Joachims
T Zhang
TH Dang
W Kabsch
WK Kim
X-w Chen
Xiaolong Wang
Xuan Wang
Y Altun
Y Liu
Y Ofran
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Automatic structure classification of small proteins using random forest

Author: A Andreeva
A Andreeva
AG Murzin
AV Levitin
C Hadley
CHQ Ding
E Ie
G Zhanga
H Shen
HM Berman
I Chung
I Melvin
IH Witten
J Cheng
J Wu
JE Gewehr
JF Gibrat
Jonathan D Hirst
JR Quinlan
K Chen
KC Chou
L Breiman
L Holm
L Kurgan
M Gerstein
MB Swindells
MTA Shamim
O Çamoğlu
P Baldi
P Han
P Jain
P Klein
Pooja Jain
S Kim
S Mile
S Vinga
SE Brenner
SE Hamby
SF Altschul
SP Kanaan
SS Krishna
U Hobohm
V Sam
W Kabsch
X Chen
X Chen
XM Zhao
Y Cai
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP <it>Class, Fold, Super-family </it>or <it>Family </it>levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true <it>Class, Fold, Super-family </it>or <it>Family </it>levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PPIcons: identification of protein-protein interaction sites in selected organisms

Author: Brijesh K. Sriwastava
Dariusz Plewczynski
Subhadip Basu
Ujjwal Maulik
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

The physico-chemical properties of interaction interfaces have a crucial role in characterization of protein–protein interactions (PPI). In silico prediction of participating amino acids helps to identify interface residues for further experimental verification using mutational analysis, or inhibition studies by screening library of ligands against given protein. Given the unbound structure of a protein and the fact that it forms a complex with another known protein, the objective of this work is to identify the residues that are involved in the interaction. We attempt to predict interaction sites in protein complexes using local composition of amino acids together with their physico-chemical characteristics. The local sequence segments (LSS) are dissected from the protein sequences using a sliding window of 21 amino acids. The list of LSSs is passed to the support vector machine (SVM) predictor, which identifies interacting residue pairs considering their inter-atom distances. We have analyzed three different model organisms of Escherichia coli, Saccharomyces Cerevisiae and Homo sapiens, where the numbers of considered hetero-complexes are equal to 40, 123 and 33 respectively. Moreover, the unified multi-organism PPI meta-predictor is also developed under the current work by combining the training databases of above organisms. The PPIcons interface residues prediction method is measured by the area under ROC curve (AUC) equal to 0.82, 0.75, 0.72 and 0.76 for the aforementioned organisms and the meta-predictor respectively. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00894-013-1886-9) contains supplementary material, which is available to authorized users

Springer - Publisher Connector

PubMed Central

Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

Author: Daberdaku Sebastian
Ferrari Carlo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Author: A Porollo
AJ Bordner
B Wang
B Wang
BD Alberts
C Cortes
C Sander
ED Levy
F Glaser
F Pazos
H Chen
H Zhou
HM Berman
HS Wong
I Ezkurdia
I Res
J Chung
J Janin
J Kittler
J Kyte
J Mihel
JC Bezdek
Jinyan Li
JR Bradford
JR Bradford
KS Thorn
L Lo Conte
LI Kuncheva
LK Hansen
M Charton
M Guharoy
M Sikic
N H
P Baldi
P Chakrabarti
P Chen
P Cherepanov
P Cherepanov
P Fariselli
Peng Chen
Q Dong
R Singh
RA Laskowski
RD Pascual-Marqui
RM Kini
RP Bahadur
RP Bahadur
S Jones
S Jones
S Jones
SJ de Vries
T Friedrich
T Kohonen
TA Larsen
TJ Bollenbach
Uni-Prot-Consortium
V Chelliah
W Kauzmann
X Du
X Gallet
XW Chen
Y Murakami
Y Ofran
Y Ofran
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone. Results We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods. Conclusions The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance. Availability Datasets and software are available at <url>http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central