Search CORE

30 research outputs found

Going from where to why—interpretable prediction of protein subcellular localization

Author: Bannai
Blum
Boden
Brady
Briesemeister
Carlson
Casadio
Cedano
Chou
Chou
Chou
Chou
Chou
Cokol
Cui
Emanuelsson
Fayyad
Fujiwara
Fyshe
Garg
Garg
Guo
Hall
Horton
Hua
Huang
Höglund
Jörg Rahnenführer
Kaiser
King
Lee
Lei
Lin
Lu
Lu
Nair
Nair
Nair
Nakai
Oliver Kohlbacher
Outten
Park
Petsalaki
Pierleoni
Reinhardt
Rish
Scott
Scott
Sebastian Briesemeister
Shin
Small
Takada
Tsoumakas
Whitten
Xie
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Protein subcellular localization is pivotal in understanding a protein's function. Computational prediction of subcellular localization has become a viable alternative to experimental approaches. While current machine learning-based methods yield good prediction accuracy, most of them suffer from two key problems: lack of interpretability and dealing with multiple locations

Crossref

PubMed Central

Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework

Author: Shatkay Hagit
Simha Ramanuja
Publication venue
Publication date: 29/07/2013
Field of study

Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of single- and multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without restricting predictions to be based only on location-combinations present in the training set.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Springer - Publisher Connector

YLoc—an interpretable web server for predicting subcellular localization

Author: Briesemeister Sebastian
Kohlbacher Oliver
Rahnenführer Jörg
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Predicting subcellular localization has become a valuable alternative to time-consuming experimental methods. Major drawbacks of many of these predictors is their lack of interpretability and the fact that they do not provide an estimate of the confidence of an individual prediction. We present YLoc, an interpretable web server for predicting subcellular localization. YLoc uses natural language to explain why a prediction was made and which biological property of the protein was mainly responsible for it. In addition, YLoc estimates the reliability of its own predictions. YLoc can, thus, assist in understanding protein localization and in location engineering of proteins. The YLoc web server is available online at www.multiloc.org/YLoc

CiteSeerX

PubMed Central

Evidence for the localization of the Arabidopsis cytokinin receptors AHK3 and AHK4 in the endoplasmic reticulum

Author: Caesar Katharina
Elgass Kirstin
Grefen Christopher
Harter Klaus
Horak Jakub
Huppenberger Peter
Thamm Antje M. K.
Witthöft Janika
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Cytokinins are hormones that are involved in various processes of plant growth and development. The model of cytokinin signalling starts with hormone perception through membrane-localized histidine kinase receptors. Although the biochemical properties and functions of these receptors have been extensively studied, there is no solid proof of their subcellular localization. Here, cell biological and biochemical evidence for the localization of functional fluorophor-tagged fusions of Arabidopsis histidine kinase 3 (AHK3) and 4 (AHK4), members of the cytokinin receptor family, in the endoplasmic reticulum (ER) is provided. Furthermore, membrane-bound AHK3 interacts with AHK4 in vivo. The ER localization and putative function of cytokinin receptors from the ER have major impacts on the concept of cytokinin perception and signalling, and hormonal cross-talk in plants

Crossref

PubMed Central

Enlighten

Minimalist Ensemble Algorithms for Genome-Wide Protein Localization Prediction

Author: Hu Jianjun
Lin J.-R.
Liu R.
Mondal A. M.
Publication venue: Scholar Commons
Publication date: 01/01/2012
Field of study

Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi

Springer - Publisher Connector

Scholar Commons - Institutional Repository of the University of South Carolina

PubMed Central

TESTLoc: protein subcellular localization prediction from EST data

Author: A Chacinska
A Kumar
A Pierleoni
A Reinhardt
AG Hatzigeorgiou
BF Lang
C Guda
C Guda
C Iseli
CS Yu
CS Yu
D Sarda
Gertraud Burger
H Bannai
H Shatkay
HM Yuan
HN Lin
HW Platta
I Small
J Assfalg
J Li
J Liu
J Parkinson
JD Wasmuth
K Baerenfaller
KC Chou
KC Chou
KJ Park
L Barbe
LB Koski
M Boden
MG Claros
MS Boguski
MS Scott
O Emanuelsson
P Rice
R Casadio
R Kaundal
R Lascaris
R Nair
R Nair
R Nair
RE Fan
S Briesemeister
S Hua
SF Altschul
T Blum
TM Devlin
W Li
WK Huh
Y Huang
Y Lee
Yao-Qing Shen
YQ Shen
YQ Shen
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DDAG K-TIPCAC : an ensemble method for protein subcellular localization

Author: A. Rozza
E. Casiraghi
G. Lombardi
G. Valentini
M. Re
Publication venue: ECML
Publication date: 01/09/2010
Field of study

Protein subcellular location prediction is one of the most difficult multiclass prediction problems in modern computational biology. Many methods have been proposed in the literature to solve this problem, but all the existing approaches are affected by some limitations. In this contribution we propose a novel method for protein subcellular location prediction that performs multiclass classification by combining kernel classifiers through DDAG. Each base classifier, called K-TIPCAC, projects the points on a Fisher subspace estimated on the training data by means of a novel technique. Experimental results clearly indicated that DDAG K-TIPCAC performs equally, if not better, than state-of-the-art ensemble methods for protein subcellular location

AIR Universita degli studi di Milano

Integration of molecular biology tools for identifying promoters and genes abundantly expressed in flowers of Oncidium Gower Ramsey

Author: Chan M.T.
Chou S.J.
Hsu C.T.
Liao D.C.
Lin C.S.
Liu N.T.
Shen S.C.
Tung S.Y.
Wu F.H.
Yang C.H.
楊長賢
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Orchids comprise one of the largest families of flowering plants and generate commercially important flowers. However, model plants, such as <it>Arabidopsis thaliana </it>do not contain all plant genes, and agronomic and horticulturally important genera and species must be individually studied. Results Several molecular biology tools were used to isolate flower-specific gene promoters from <it>Oncidium </it>'Gower Ramsey' (<it>Onc</it>. GR). A cDNA library of reproductive tissues was used to construct a microarray in order to compare gene expression in flowers and leaves. Five genes were highly expressed in flower tissues, and the subcellular locations of the corresponding proteins were identified using lip transient transformation with fluorescent protein-fusion constructs. BAC clones of the 5 genes, together with 7 previously published flower- and reproductive growth-specific genes in <it>Onc</it>. GR, were identified for cloning of their promoter regions. Interestingly, 3 of the 5 novel flower-abundant genes were putative trypsin inhibitor (<it>TI</it>) genes (<it>OnTI1</it>, <it>OnTI2 </it>and <it>OnTI3</it>), which were tandemly duplicated in the same BAC clone. Their promoters were identified using transient GUS reporter gene transformation and stable <it>A. thaliana </it>transformation analyses. Conclusions By combining cDNA microarray, BAC library, and bombardment assay techniques, we successfully identified flower-directed orchid genes and promoters.</p

Springer - Publisher Connector

Directory of Open Access Journals

National Chung Hsing University Institutional Repository

PubMed Central

Imbalanced Multi-Modal Multi-Label Learning for Subcellular Localization Prediction of Human Proteins with Both Single and Multiple Sites

Author: A Hoglund
B Liao
CE Rasmussen
DN Georgiou
FM Li
Franca Fraternali
G Tsoumakas
GP Zhou
H Mohabatkar
H Mohabatkar
H Nakashima
HB Shen
HB Shen
HB Shen
HB Shen
HN Lin
Hong Gu
J Ma
J Ma
J Tian
J Yin
Jianjun He
JY Shi
K Imai
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KY Lee
L Chen
L Chen
L Hu
LJ Foster
LL Hu
M Esmaeili
MS Scott
O Emanuelsson
P Wang
P Wang
RE Schapire
S Briesemeister
S Hua
S Mei
S Mei
S Zhang
T Huang
T Huang
T Huang
T Liu
Wenqi Liu
WZ Lin
X Jiang
X Xiao
X Xiao
X Xiao
YH Zeng
YL Chen
YL Chen
Z He
Z Lu
ZC Wu
ZC Wu
Publication venue: Public Library of Science
Publication date: 08/06/2012
Field of study

It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using Gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Compartmentation of Redox Metabolism in Malaria Parasites

Author: A Krogh
A Kumar
AV Kochetov
BJ Foth
BS Crabb
C Nickel
CJ Tonkin
CJ Tonkin
CJ Tonkin
DT Trang
E Balconi
F Missirlis
GN Sarma
H Sztajer
IW Boucher
J Nyalwidhe
J Riemer
JF Turrens
Jude M. Przyborski
K Becker
K Becker
K Fritz-Wolf
Katja Becker
Leann Tilley
M Akoachere
M Deponte
M Rossner
M Urscher
MD Cappellini
MJ Gardner
N Sienkiewicz
NH Hunt
Nicole Sturm
P Becuwe
P Pino
P Porras
PJ McMillan
PM Farber
RF Waller
RF Waller
S Briesemeister
S Kawazu
S Koncarevic
S Muller
S Muller
S Rahlfs
S Rahlfs
S Spork
SA Ralph
SA Ralph
Sebastian Kehr
Stefan Rahlfs
T Fleige
TF de Koning-Ward
Publication venue: Public Library of Science
Publication date: 01/12/2010
Field of study

Malaria, caused by the apicomplexan parasite Plasmodium, still represents a major threat to human health and welfare and leads to about one million human deaths annually. Plasmodium is a rapidly multiplying unicellular organism undergoing a complex developmental cycle in man and mosquito – a life style that requires rapid adaptation to various environments. In order to deal with high fluxes of reactive oxygen species and maintain redox regulatory processes and pathogenicity, Plasmodium depends upon an adequate redox balance. By systematically studying the subcellular localization of the major antioxidant and redox regulatory proteins, we obtained the first complete map of redox compartmentation in Plasmodium falciparum. We demonstrate the targeting of two plasmodial peroxiredoxins and a putative glyoxalase system to the apicoplast, a non-photosynthetic plastid. We furthermore obtained a complete picture of the compartmentation of thioredoxin- and glutaredoxin-like proteins. Notably, for the two major antioxidant redox-enzymes – glutathione reductase and thioredoxin reductase – Plasmodium makes use of alternative-translation-initiation (ATI) to achieve differential targeting. Dual localization of proteins effected by ATI is likely to occur also in other Apicomplexa and might open new avenues for therapeutic intervention

Public Library of Science (PLOS)

Crossref

PubMed Central

University of Melbourne Institutional Repository