Search CORE

Public Library of Science (PLOS)

A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0

Author: Darren P. Martin
Hong-Bin Shen
Kuo-Chen Chou
Publication venue: Public Library of Science
Publication date: 01/04/2010
Field of study

Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called “Euk-mPLoc 2.0” is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development

University of the South Pacific Electronic Research Repository

Predict gram - positive and gram - negative subcellular localization via incorporating evolutionary information and physicochemical features into Chou’s general PseAAC

Author: Dehzangi A.
Lyons J.
Paliwal K.K.
Sharma Alokanand
Sharma Ronesh
Tsunoda T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this study, we used structural and evolutionary based features to represent the sequences of gram-positive and gram-negative subcellular localizations. To do this, we proposed a normalization method to construct a normalize Position Specific Scoring Matrix (PSSM) using the information from original PSSM. To investigate the effectiveness of the proposed method we compute feature vectors from normalize PSSM and by applying Support Vector Machine (SVM) and Naïve Bayes classifier, respectively, we compared achieved results with the previously reported results. We also computed features from original PSSM and normalized PSSM and compared their results. The archived results show enhancement in gram-positive and gram-negative subcellular localizations. Evaluating localization for each feature, our results indicate that employing SVM and concatenating features (amino acid composition feature, Dubchak feature (physicochemical-based features), normalized PSSM based auto-covariance feature and normalized PSSM based bigram feature) have higher accuracy while employing Naïve Bayes classifier with normalized PSSM based auto-covariance feature proves to have high sensitivity for both benchmarks. Our reported results in terms of overall locative accuracy is 84.8% and overall absolute accuracy is 85.16% for gram-positive dataset; and, for gram- negative dataset, overall locative accuracy is 85.4% and overall absolute accuracy is 86.3%

Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence

Author: A Kumar
A Reinhardt
B Matthews
C Guda
C Guda
CH Wu
G Dellaire
G-P Zhou
H Liu
H-B Shen
H-B Shen
HGE Sutherland
J Cedano
JL Heazlewood
K Nakai
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-C Chou
K-J Park
M Wang
M Wang
MA Andrade
MS Scott
O Emanuelsson
P Lio
Pufeng Du
Q-B Gao
RA Gottlieb
S Hua
S Kawashima
S-Q Wang
W Jassem
W Li
WA BickMore
Y Huang
Y-D Cai
Y-D Cai
Y-D Cai
Yanda Li
Z Lei
Z Yuan
Z-P Feng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins. RESULTS: By using leave-one-out cross validation, the prediction accuracy is 85.5% for inner membrane, 94.5% for matrix and 51.2% for outer membrane. The overall prediction accuracy for submitochondria location prediction is 85.2%. For proteins predicted to localize at inner membrane, the accuracy is 94.6% for membrane protein type prediction. CONCLUSION: Our method is an effective method for predicting protein submitochondria location. But even with our method or the methods at subcellular level, the prediction of protein submitochondria location is still a challenging problem. The online service SubMito is now available at

Springer - Publisher Connector

Amino acid classification based spectrum kernel fusion for protein subnuclear localization

Author: A Dijk
A Hoglund
B Boeckmann
C Leslie
C Leslie
G Dellaire
G Lanckriet
G Schneider
H Rangwala
H Shen
J Cedano
J Guo
J Shen
J Taylor
K Chou
K Chou
K Lee
M Edward
M Mak
M Richard
P Jia
R Kuang
R Kuang
S Alejandro
S Mei
Suyu Mei
T Tung
V Vapnik
Wang Fei
Z Alexander
Z Lei
Z Lei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Prediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization. There are only three computational models for protein subnuclear localization thus far, to the best of our knowledge. Two models were based on protein primary sequence only. The first model assumed homogeneous amino acid substitution pattern across all protein sequence residue sites and used BLOSUM62 to encode <it>k</it>-mer of protein sequence. Ensemble of SVM based on different <it>k</it>-mers drew the final conclusion, achieving 50% overall accuracy. The simplified assumption did not exploit protein sequence profile and ignored the fact of heterogeneous amino acid substitution patterns across sites. The second model derived the <it>PsePSSM </it>feature representation from protein sequence by simply averaging the profile PSSM and combined the <it>PseAA </it>feature representation to construct a kNN ensemble classifier <it>Nuc-PLoc</it>, achieving 67.4% overall accuracy. The two models based on protein primary sequence only both achieved relatively poor predictive performance. The third model required that GO annotations be available, thus restricting the model's applicability. Methods In this paper, we only use the amino acid information of protein sequence without any other information to design a widely-applicable model for protein subnuclear localization. We use <it>K</it>-spectrum kernel to exploit the contextual information around an amino acid and the conserved motif information. Besides expanding window size, we adopt various amino acid classification approaches to capture diverse aspects of amino acid physiochemical properties. Each amino acid classification generates a series of spectrum kernels based on different window size. Thus, (I) window expansion can capture more contextual information and cover size-varying motifs; (II) various amino acid classifications can exploit multi-aspect biological information from the protein sequence. Finally, we combine all the spectrum kernels by simple addition into one single kernel called <it>SpectrumKernel+ </it>for protein subnuclear localization. Results We conduct the performance evaluation experiments on two benchmark datasets: <it>Lei </it>and <it>Nuc-PLoc</it>. Experimental results show that <it>SpectrumKernel+ </it>achieves substantial performance improvement against the previous model <it>Nuc-PLoc</it>, with overall accuracy <it>83.47% </it>against <it>67.4%</it>; and <it>71.23% </it>against <it>50% </it>of <it>Lei SVM Ensemble</it>, against 66.50% of <it>Lei GO SVM Ensemble</it>. Conclusion The method <it>SpectrumKernel</it>+ can exploit rich amino acid information of protein sequence by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches. The kernels derived from diverse amino acid classification approaches and different sizes of <it>k</it>-mer are summed together for data integration. Experiments show that the method <it>SpectrumKernel</it>+ significantly outperforms the existing models for protein subnuclear localization.</p

Springer - Publisher Connector