Search CORE

Repository of the Academy's Library

Evaluation and comparison of mammalian subcellular localization prediction methods

Author: Fink J Lynn
Sprenger Josefine
Teasdale Rohan D
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. RESULTS: In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations (nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, and lysosome). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. CONCLUSION: No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity

University of Queensland eSpace

TpPred: A Tool for Hierarchical Prediction of Transport Proteins Using Cluster of Neural Networks and Sequence Derived Features

Author: Jain Sankalp
Naik Pradeep Kumar
Ranjan Piyush
Sengupta Dipankar
Publication venue: International Journal for Computational Biology (IJCB)
Publication date: 21/04/2014
Field of study

A top–down predictor, called TpPred, is developed which consists of 3 level of hierarchical classification using cascade of neural networks from sequence derived features. The 1st layer of the prediction engine is for identifying a query protein as transport protein or not; the 2nd layer for the main functional class; and the 3rd layer for the sub-functional class. The overall success rates for all the three layers are higher than 65% that were obtained through rigorous cross-validation tests on the very stringent benchmark datasets in which none of the proteins has 30% sequence identity with any other in the same class or subclass. TpPred achieved good prediction accuracies and could nicely complement experimental approaches for identification of transport proteins. TpPred is freely available to be use in-house as a standalone version and is accessible at http://www.juit.ac.in/attachments/tppred/Home.html

International Journal for Computational Biology (IJCB)

Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators

Author: Lefevre Christophe
Muthukrishnan Selvaraj
Puri Munish
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Plasminogen (Pg), the precursor of the proteolytic and fibrinolytic enzyme of blood, is converted to the active enzyme plasmin (Pm) by different plasminogen activators (tissue plasminogen activators and urokinase), including the bacterial activators streptokinase and staphylokinase, which activate Pg to Pm and thus are used clinically for thrombolysis. The identification of Pg-activators is therefore an important step in understanding their functional mechanism and derives new therapies

Deakin Research Online

University of Melbourne Institutional Repository

Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble

Author: Guo-Zheng Li
Jun Zhang
Xiao Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Semi-supervised protein subcellular localization

Author: A Blum
A Levin
A Pierleoni
A Reinhardt
A Sarkar
B JD
C Yu
C Zhang
C Zhang
CJL Chine-Sheng Yu
D Xie
Derek Hao Hu
ECY Su
G Zhou
G Zhou
G Zhou
H Nakashima
HB Shen
Hong Xue
I Bahar
J Gardy
J Wang
K Chou
K Chou
K Chou
K Chou
K Chou
K Chou
K Nakai
K Nigam
K Park
L Breiman
L Breiman
L Breiman
M Bhasin
M Claros
M Li
O Emanuelsson
P Horton
Qian Xu
Qiang Yang
R Luo
R Nair
R Nair
RPC Nair
S Hua
S Muskal
T Guo
T Joachims
T Joachims
TK Ho
W Liu
Weichuan Yu
X Zhu
Y Cai
Y Cai
Y Freund
Y Huang
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. Results In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. Conclusion Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.</p

Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence

Author: A Kumar
A Reinhardt
B Matthews
C Guda
C Guda
CH Wu
G Dellaire
G-P Zhou
H Liu
H-B Shen
H-B Shen
HGE Sutherland
J Cedano
JL Heazlewood
K Nakai
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-J Park
M Wang
M Wang
MA Andrade
MS Scott
O Emanuelsson
P Lio
Pufeng Du
Q-B Gao
RA Gottlieb
S Hua
S Kawashima
S-Q Wang
W Jassem
W Li
WA BickMore
Y Huang
Y-D Cai
Y-D Cai
Y-D Cai
Yanda Li
Z Lei
Z Yuan
Z-P Feng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins. RESULTS: By using leave-one-out cross validation, the prediction accuracy is 85.5% for inner membrane, 94.5% for matrix and 51.2% for outer membrane. The overall prediction accuracy for submitochondria location prediction is 85.2%. For proteins predicted to localize at inner membrane, the accuracy is 94.6% for membrane protein type prediction. CONCLUSION: Our method is an effective method for predicting protein submitochondria location. But even with our method or the methods at subcellular level, the prediction of protein submitochondria location is still a challenging problem. The online service SubMito is now available at

FGsub: Fusarium graminearum protein subcellular localizations predicted from primary structures

Author: A Höglund
A Pierleoni
A Reinhardt
AC Christina
Chenglei Sun
CJ Shin
FG Priest
H Chen
H Nakashima
J Cedano
J Liu
J Wang
JL Gardy
JM Chang
JW Bennett
K Nakai
KC Chou
KJ Park
KY Lee
Luonan Chen
M Bhasin
MS Scott
O Emanuelsson
P Garga
P Horton
R Nair
RS Goswami
SJ Hua
T Tamura
TU Consortium
U Guldener
Weihua Tang
WZ Li
Xing-Ming Zhao
XM Zhao
XM Zhao
Y Cai
Y Huang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

A method to improve protein subcellular localization prediction by integrating various biological data sources

Author: A Bairoch
A Drawid
A Reinhardt
C Kuo-Chen
CS Yu
Doheon Lee
E Camon
H Nielsen
H Wen-Lin
Huang Ying
I Lee
J Cedano
J Guo
K Lee
K Nakai
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
M Reczko
O Emanuelsson
O Emanuelsson
P Horton
P Horton
P Horton
S Hagit
S Hua
S Michelle
Thai Quang Tung
WK Huh
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. Results In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. Conclusion Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p