Search CORE

84 research outputs found

Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition

Author: A Ben–Hur
A Reinhardt
BW Matthews
C Guda
C Leslie
CS Yu
CS Yu
CS Yu
H Nielsen
J Cedano
J Guo
K Nakai
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
M Bhasin
M Bhasin
M Bhasin
M Kumar
M Reczko
O Emanuelsson
O Emanuelsson
P Horton
P Horton
P Horton
P Pavlidis
R Nair
S Hua
S Matsuda
Takeyuki Tamura
Tatsuya Akutsu
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. Results: In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Conclusion: Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html webcite

Crossref

Springer - Publisher Connector

PubMed Central

Kyoto University Research Information Repository

PROlocalizer: integrated web service for protein subcellular localization prediction

Author: A Garg
A Krogh
B Eisenhaber
B Eisenhaber
B Eisenhaber
B Martoglio
C Guda
CS Yu
E Castro de
EW Klee
G Neuberger
G Neuberger
GH Schneider
HB Shen
J Sprenger
J Thusberg
JD Bendtsen
K Laurila
K Nakai
KC Chou
KC Chou
Kirsti Laurila
L Käll
M Cokol
Mauno Vihinen
O Emanuelsson
O Emanuelsson
O Emanuelsson
O Emanuelsson
P Dönnes
P Horton
R Falk
R Nair
RM Stroud
SR Sunyaev
TN Davis
Z Lu
Z Yuan
Publication venue: Springer Vienna
Publication date: 01/01/2010
Field of study

Subcellular localization is an important protein property, which is related to function, interactions and other features. As experimental determination of the localization can be tedious, especially for large numbers of proteins, a number of prediction tools have been developed. We developed the PROlocalizer service that integrates 11 individual methods to predict altogether 12 localizations for animal proteins. The method allows the submission of a number of proteins and mutations and generates a detailed informative document of the prediction and obtained results. PROlocalizer is available at http://bioinf.uta.fi/PROlocalizer/

Lund University Publications

Crossref

Springer - Publisher Connector

PubMed Central

A method to improve protein subcellular localization prediction by integrating various biological data sources

Author: A Bairoch
A Drawid
A Reinhardt
C Kuo-Chen
CS Yu
Doheon Lee
E Camon
H Nielsen
H Wen-Lin
Huang Ying
I Lee
J Cedano
J Guo
K Lee
K Nakai
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
M Reczko
O Emanuelsson
O Emanuelsson
P Horton
P Horton
P Horton
S Hagit
S Hua
S Michelle
Thai Quang Tung
WK Huh
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. Results In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. Conclusion Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins

Author: A Garg
A Garg
A Hoglund
A Pierleoni
A Reinhardt
Aarti Garg
CS Yu
D Sarda
D Szafron
D Xie
DT Jones
E Tantoso
Gajendra PS Raghava
H Kaur
J Guo
JD Bendtsen
JL Gardy
K Nakai
K Nakai
K Park
KC Chou
KC Chou
KJ Park
M Bhasin
M Bhasin
O Emanuelsson
O Emanuelsson
Q Cui
R Nair
R Nair
S Hua
S Matsuda
SF Altschul
T Habib
WL Huang
Y Huang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features. Results Here, we describe augmentation in the prediction performance obtained for our most popular ESLpred method using new crucial features as an input to Support Vector Machine (SVM). In addition, recently available, highly non-redundant dataset encompassing three kingdoms specific protein sequence sets; 1198 fungi sequences, 2597 from animal and 491 plant sequences were also included in the present study. First, using the evolutionary information in the form of profile composition along with whole and N-terminal sequence composition as an input feature vector of 440 dimensions, overall accuracies of 72.7, 75.8 and 74.5% were achieved respectively after five-fold cross-validation. Further, enhancement in performance was observed when similarity search based results were coupled with whole and N-terminal sequence composition along with profile composition by yielding overall accuracies of 75.9, 80.8, 76.6% respectively; best accuracies reported till date on the same datasets. Conclusion These results provide confidence about the reliability and accurate prediction of SVM modules generated in the present study using sequence and profile compositions along with similarity search based results. The presently developed modules are implemented as web server "ESLpred2" available at <url>http://www.imtech.res.in/raghava/eslpred2/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TESTLoc: protein subcellular localization prediction from EST data

Author: A Chacinska
A Kumar
A Pierleoni
A Reinhardt
AG Hatzigeorgiou
BF Lang
C Guda
C Guda
C Iseli
CS Yu
CS Yu
D Sarda
Gertraud Burger
H Bannai
H Shatkay
HM Yuan
HN Lin
HW Platta
I Small
J Assfalg
J Li
J Liu
J Parkinson
JD Wasmuth
K Baerenfaller
KC Chou
KC Chou
KJ Park
L Barbe
LB Koski
M Boden
MG Claros
MS Boguski
MS Scott
O Emanuelsson
P Rice
R Casadio
R Kaundal
R Lascaris
R Nair
R Nair
R Nair
RE Fan
S Briesemeister
S Hua
SF Altschul
T Blum
TM Devlin
W Li
WK Huh
Y Huang
Y Lee
Yao-Qing Shen
YQ Shen
YQ Shen
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models

Author: A Blum
A Goldberg
A Höglund
Adrian Silvescu
AP Dempster
Cornelia Caragea
CS Ong
D Ron
Doina Caragea
G Camps-valls
G Casella
J Lafferty
J Lin
J Weston
J Zhang
JL Gardy
K Nigam
K Park
L Breiman
L Käll
M Belkin
M Li
M Szummer
MS Scott
ND Lawrence
O Emanuelsson
P Baldi
P Kuksa
Q Xu
T Jaakkola
T Jebara
T Joachims
TG Dietterich
Vasant Honavar
W Ansorge
X Zhu
Y Bengio
Y Grandvalet
Y Qi
Y Yuan
ZY Niu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing <it>semi-supervised methods</it> for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data. Results In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting <it>unlabeled</it> data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data). Conclusions The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UNT Digital Library

Deep Sequencing of Pyrethroid-Resistant Bed Bugs Reveals Multiple Mechanisms of Resistance within a Single Population

Author: A Romero
A Romero
AM Polanco
C Claudianos
C Strode
CS Lofgren
Dini M. Miller
DJ Moore
DR Nelson
F Zhu
H Ranson
H Ranson
IM Francischetti
Immo A. Hansen
JG Oakeshott
JR Busvine
JS Ramsey
K Tamura
Kathleen A. Kilcullen
KS Yoon
KY Zhu
KY Zhu
M Feroz
MF Potter
Michelle A. E. Anderson
O Emanuelsson
R Feyereisen
R Feyereisen
RD Finn
Reina Koganemaru
SF Altschul
SHPP Karunaratne
SM Valles
TD Anderson
Troy D. Anderson
X Bai
Zach N. Adelman
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A frightening resurgence of bed bug infestations has occurred over the last 10 years in the U.S. and current chemical methods have been inadequate for controlling this pest due to widespread insecticide resistance. Little is known about the mechanisms of resistance present in U.S. bed bug populations, making it extremely difficult to develop intelligent strategies for their control. We have identified bed bugs collected in Richmond, VA which exhibit both kdr-type (L925I) and metabolic resistance to pyrethroid insecticides. Using LD50 bioassays, we determined that resistance ratios for Richmond strain bed bugs were ∼5200-fold to the insecticide deltamethrin. To identify metabolic genes potentially involved in the detoxification of pyrethroids, we performed deep-sequencing of the adult bed bug transcriptome, obtaining more than 2.5 million reads on the 454 titanium platform. Following assembly, analysis of newly identified gene transcripts in both Harlan (susceptible) and Richmond (resistant) bed bugs revealed several candidate cytochrome P450 and carboxylesterase genes which were significantly over-expressed in the resistant strain, consistent with the idea of increased metabolic resistance. These data will accelerate efforts to understand the biochemical basis for insecticide resistance in bed bugs, and provide molecular markers to assist in the surveillance of metabolic resistance

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

Author: A Bulashevska
A Krogh
C Andreoli
C Guda
C Guda
CS Yu
E Badidi
E Frank
GE Tusnady
Gertraud Burger
H Bannai
H Shatkay
HB Shen
HB Shen
I Small
JL Heazlewood
JR Quinlan
JY Shi
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
L Kall
M Bhasin
M Boden
MG Claros
MS Scott
N Pfanner
N Wiedemann
O Emanuelsson
P Donnes
QB Gao
S Džeroski
S Hua
S Matsuda
SHB Chou KC
T Hirokawa
T Zhang
W Li
X Xiao
Y Huang
Yao Qing Shen
YD Cai
YD Cai
YL Chen
YX Pan
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. Results In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains. Conclusion We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein prediction (named YimLOC), which can be accessed through the AnaBench suite at http://anabench.bcm.umontreal.ca/anabench/. The source code is provided in the Additional File <supplr sid="S2">2</supplr>. <suppl id="S2"> <title> Additional file 2 </title> <text> This file contains scripts for the online server YimLOC. Please note that there scripts only codes for the ready-to-use STACK-mem-DT described in the main text. The scripts do not provide the training process. </text> <file name="1471-2105-8-420-S2.pdf"> Click here for file </file> </suppl

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Interaction of the heterotrimeric G protein alpha subunit SSG-1 of Sporothrix schenckii with proteins related to stress response and fungal pathogenicity using a yeast two-hybrid assay

Author: A Bairoch
A Krogh
A Van Ho
AC da Rosa
AM Kays
AM Preininger
BD Halligan
C Notredame
C Sadhu
C Sanchez-Martinez
CH Wu
CR McCudden
CS Hwang
CS Hwang
D Gozalbo
D Shenton
DJ Dupre
DJ Kosman
E Frealle
E Regenfelder
ED Weinberg
EE Aquino-Pinero
EE Luk
EG Fang
EL Sonnhammer
Elizabeth González
Emilee E Colón-Lorenzo
G Poli
GE Turner
GH Choi
GM Cox
I Holsbeeks
I Miyajima
J Kaplan
J Kruger
JK Hicks
JM Thevelein
JR Forbes
JR Forbes
JS Lyssand
K Nakai
KB Lengeler
KL Tangen
L Bardwell
Lizaida Pérez-Sánchez
LR Travassos
M Holinstat
M Holzberg
M Nakafuku
M Nakafuku
M Rubio-Texeira
ME Portnoy
MF Cellier
MS Barbosa
N Delgado
Nuri Rodríguez-del Valle
O Emanuelsson
P Courville
P Heymann
PD Thomas
RA Baasiri
RG Cuadros
Ricardo González-Méndez
S Betancourt
S Conias
S Gao
S Henikoff
S Liu
S Morigasaki
S Valentin-Berrios
SD Narasipura
SD Narasipura
SF Altschul
SR Sprang
SS Giles
SS Pao
T Tolkacheva
TB Parsley
TE Kehl-Fie
TM Cabrera-Vera
UE Schaible
VC Culotta
VJ Thannickal
Waleska González-Velázquez
WM Oldham
Y Li
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A Chromosomally Encoded Virulence Factor Protects the Lyme Disease Pathogen against Host-Adaptive Immunity

Author: AC Steere
AC Steere
Adam S. Coleman
AF Elias
AG Barbour
AK Das
AS Coleman
AT Revel
C Ojaimi
CM Fraser
CS Brooks
DD Bolz
E Hodzic
E Hodzic
FT Liang
FT Liang
J Radolf
J Seshu
JA Carroll
JE Purser
Jenifer Coburn
JL Bono
JM Battisti
JR Zhang
Juan Anguita
K Nakai
KJ Livak
KL Frank
LK Bockenstedt
M Labandeira-Rey
MA Fisher
MB Lawrenz
MW Jewett
O Emanuelsson
OS Shin
PA Rosa
PE Stewart
R Tokarz
RB Nadelman
RJ Schulze
RS Sikorski
S Antonara
S Casjens
S Narasimhan
S Narasimhan
SE Connolly
SW Barthold
SW Barthold
SW Barthold
TG Schwan
U Pal
U Pal
U Pal
U Pal
U Schaible
Utpal Pal
X Li
X Li
X Li
X Wang
XF Yang
Xiuli Yang
Y Shi
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

Borrelia burgdorferi, the bacterial pathogen of Lyme borreliosis, differentially expresses select genes in vivo, likely contributing to microbial persistence and disease. Expression analysis of spirochete genes encoding potential membrane proteins showed that surface-located membrane protein 1 (lmp1) transcripts were expressed at high levels in the infected murine heart, especially during early stages of infection. Mice and humans with diagnosed Lyme borreliosis also developed antibodies against Lmp1. Deletion of lmp1 severely impaired the pathogen's ability to persist in diverse murine tissues including the heart, and to induce disease, which was restored upon chromosomal complementation of the mutant with the lmp1 gene. Lmp1 performs an immune-related rather than a metabolic function, as its deletion did not affect microbial persistence in immunodeficient mice, but significantly decreased spirochete resistance to the borreliacidal effects of anti-B. burgdorferi sera in a complement-independent manner. These data demonstrate the existence of a virulence factor that helps the pathogen evade host-acquired immune defense and establish persistent infection in mammals

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central