Search CORE

41 research outputs found

Signal peptides and protein localization prediction

Author: Nielsen Henrik
Publication venue: John Wiley and Sons Ltd
Publication date: 01/01/2005
Field of study

Evaluation and comparison of mammalian subcellular localization prediction methods

Author: Fink J Lynn
Sprenger Josefine
Teasdale Rohan D
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. RESULTS: In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations (nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, and lysosome). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. CONCLUSION: No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble

Author: Guo-Zheng Li
Jun Zhang
Xiao Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Springer - Publisher Connector

Identification of Proteins Secreted by Malaria Parasite into Erythrocyte using SVM and PSSM profiles

Author: Kaur Sukhwinder
Raghava Gajendra PS
Tiwari Ajit
Varshney Grish C
Verma Ruchi
Publication venue: BioMed Central
Publication date: 16/04/2008
Field of study

Background: Malaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite. Results: In this study a systematic attempt has been made to develop a general method for predicting secretory proteins of malaria parasite. All models were trained and tested on a non-redundant dataset of 252 secretory and 252 non-secretory proteins. We developed SVM models and achieved maximum MCC 0.72 with 85.65% accuracy and MCC 0.74 with 86.45% accuracy using amino acid and dipeptide composition respectively. SVM models were developed using split-amino acid and split-dipeptide composition and achieved maximum MCC 0.74 with 86.40% accuracy and MCC 0.77 with accuracy 88.22% respectively. In this study, for the first time PSSM profiles obtained from PSI-BLAST, have been used for predicting secretory proteins. We achieved maximum MCC 0.86 with 92.66% accuracy using PSSM based SVM model. All models developed in this study were evaluated using 5-fold cross-validation technique. Conclusion: This study demonstrates that secretory proteins have different residue composition than non-secretory proteins. Thus, it is possible to predict secretory proteins from its residue composition-using machine learning technique. The multiple sequence alignment provides more information than sequence itself. Thus performance of method based on PSSM profile is more accurate than method based on sequence composition. A web server PSEApred has been developed for predicting secretory proteins of malaria parasites,the URL can be found in the Availability and requirements section

Springer - Publisher Connector

PubMed Central

A machine learning based method for the prediction of secretory proteins using amino acid composition,their order and similarity-search

Author: Garg Aarti
Raghava Gajendra P. S.
Publication venue: 'IOS Press'
Publication date: 09/07/2008
Field of study

Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the pre-protein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/

FGsub: Fusarium graminearum protein subcellular localizations predicted from primary structures

Author: A Höglund
A Pierleoni
A Reinhardt
AC Christina
Chenglei Sun
CJ Shin
FG Priest
H Chen
H Nakashima
J Cedano
J Liu
J Wang
JL Gardy
JM Chang
JW Bennett
K Nakai
KC Chou
KJ Park
KY Lee
Luonan Chen
M Bhasin
MS Scott
O Emanuelsson
P Garga
P Horton
R Nair
RS Goswami
SJ Hua
T Tamura
TU Consortium
U Guldener
Weihua Tang
WZ Li
Xing-Ming Zhao
XM Zhao
XM Zhao
Y Cai
Y Huang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A method to improve protein subcellular localization prediction by integrating various biological data sources

Author: A Bairoch
A Drawid
A Reinhardt
C Kuo-Chen
CS Yu
Doheon Lee
E Camon
H Nielsen
H Wen-Lin
Huang Ying
I Lee
J Cedano
J Guo
K Lee
K Nakai
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
M Reczko
O Emanuelsson
O Emanuelsson
P Horton
P Horton
P Horton
S Hagit
S Hua
S Michelle
Thai Quang Tung
WK Huh
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. Results In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. Conclusion Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Comparative Study on Feature Extraction from Protein Sequences for Subcellular Localization Prediction

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

The Trypanosoma brucei MitoCarta and its regulation and splicing pattern during development

Author: Astrid Chanfon
Audic
Bannai
Benne
Bertrand
Besteiro
Bhasin
Bochud-Allemann
Borst
Brown
Burges
Chaudhuri
Claros
Cui
Daniel Nilsson
de Almeida
Dubchak
Eisenhaber
Eisenhaber
Emanuelsson
Ferguson
Folsch
Guda
Guda
Guo
Halic
Hashimi
Herrmann
Horton
Horton
Horvath
Hua
Huang
Huinan Wang
Juan Cui
Kanehisa
Kapila Gunasekera
Kumar
Lee
Li
Long
Lu
Matthews
Michels
Mokranjac
Nair
Nakai
Nilsson
Pagliarini
Panigrahi
Park
Perocchi
Petsalaki
Priest
Priest
Priest
Prilusky
Pusnik
Reinhardt
Sabatini
Simpson
Sloof
Small
Sutton
Tasker
Tetaud
Torsten Ochsenreiter
Uboldi
Vassella
Vickerman
von Heijne
Xiaobai Zhang
Xiaofeng Song
Xie
Ying Xu
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with ∼90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins

Crossref

PubMed Central

Bern Open Repository and Information System (BORIS)