7,763 research outputs found
Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
<p>Abstract</p> <p>Background</p> <p>The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles.</p> <p>Results</p> <p>In this study, we propose a knowledge based method, called KnowPred<sub>site</sub>, to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred<sub>site</sub>'s performance. The experiment results show that KnowPred<sub>site </sub>achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred<sub>site </sub>is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred<sub>site </sub>is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred<sub>site</sub>.</p> <p>Conclusion</p> <p>KnowPred<sub>site </sub>demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred<sub>site </sub>is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred<sub>site </sub>is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred<sub>site </sub>prediction server is available at <url>http://bio-cluster.iis.sinica.edu.tw/kbloc/</url>.</p
Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework
Knowing the location of a protein within the cell is important for
understanding its function, role in biological processes, and potential use as
a drug target. Much progress has been made in developing computational methods
that predict single locations for proteins, assuming that proteins localize to
a single location. However, it has been shown that proteins localize to
multiple locations. While a few recent systems have attempted to predict
multiple locations of proteins, they typically treat locations as independent
or capture inter-dependencies by treating each locations-combination present in
the training set as an individual location-class. We present a new method and a
preliminary system we have developed that directly incorporates
inter-dependencies among locations into the multiple-location-prediction
process, using a collection of Bayesian network classifiers. We evaluate our
system on a dataset of single- and multi-localized proteins. Our results,
obtained by incorporating inter-dependencies are significantly higher than
those obtained by classifiers that do not use inter-dependencies. The
performance of our system on multi-localized proteins is comparable to a top
performing system (YLoc+), without restricting predictions to be based only on
location-combinations present in the training set.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
TESTLoc: protein subcellular localization prediction from EST data
Abstract Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html</p
cis-acting sequences and trans-acting factors in the localization of mRNA for mitochondrial ribosomal proteins
mRNA localization is a conserved post-transcriptional process crucial for a variety of systems. Although several mechanisms have been identified, emerging evidence suggests that most transcripts reach the protein functional site by moving along cytoskeleton elements. We demonstrated previously that mRNA for mitochondrial ribosomal proteins are asymmetrically distributed in the cytoplasm, and that localization in the proximity of mitochondria is mediated by the 3′-UTR. Here we show by biochemical analysis that these mRNA transcripts are associated with the cytoskeleton through the microtubule network. Cytoskeleton association is functional for their intracellular localization near the mitochondrion, and the 3′-UTR is involved in this cytoskeleton-dependent localization. To identify the minimal elements required for localization, we generated DNA constructs containing, downstream from the GFP gene, deletion mutants of mitochondrial ribosomal protein S12 3′-UTR, and expressed them in HeLa cells. RT-PCR analysis showed that the localization signals responsible for mRNA localization are located in the first 154 nucleotides. RNA pulldown assays, mass spectrometry, and RNP immunoprecipitation assay experiments, demonstrated that mitochondrial ribosomal protein S12 3′-UTR interacts specifically with TRAP1 (tumor necrosis factor receptor-associated protein1), hnRNPM4 (heterogeneous nuclear ribonucleoprotein M4), Hsp70 and Hsp60 (heat shock proteins 70 and 60), and α-tubulin in vitro and in vivo
eSLDB: eukaryotic subcellular localization database
Eukaryotic Subcellular Localization DataBase collects the annotations of subcellular localization of eukaryotic proteomes. So far five proteomes have been processed and stored: Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana. For each sequence, the database lists localization obtained adopting three different approaches: (i) experimentally determined (when available); (ii) homology-based (when possible); and (iii) predicted. The latter is computed with a suite of machine learning based methods, developed in house. All the data are available at our website and can be searched by sequence, by protein code and/or by protein description. Furthermore, a more complex search can be performed combining different search fields and keys. All the data contained in the database can be freely downloaded in flat file format. The database is available at
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase
BACKGROUND: Orthology is a central tenet of comparative genomics and ortholog identification is instrumental to protein function prediction. Major advances have been made to determine orthology relations among a set of homologous proteins. However, they depend on the comparison of individual sequences and do not take into account divergent orthologs. RESULTS: We have developed an iterative orthology prediction method, Ortho-Profile, that uses reciprocal best hits at the level of sequence profiles to infer orthology. It increases ortholog detection by 20% compared to sequence-to-sequence comparisons. Ortho-Profile predicts 598 human orthologs of mitochondrial proteins from Saccharomyces cerevisiae and Schizosaccharomyces pombe with 94% accuracy. Of these, 181 were not known to localize to mitochondria in mammals. Among the predictions of the Ortho-Profile method are 11 human cytochrome c oxidase (COX) assembly proteins that are implicated in mitochondrial function and disease. Their co-expression patterns, experimentally verified subcellular localization, and co-purification with human COX-associated proteins support these predictions. For the human gene C12orf62, the ortholog of S. cerevisiae COX14, we specifically confirm its role in negative regulation of the translation of cytochrome c oxidase. CONCLUSIONS: Divergent homologs can often only be detected by comparing sequence profiles and profile-based hidden Markov models. The Ortho-Profile method takes advantage of these techniques in the quest for orthologs
Analyses and web interfaces for protein subcellular localization and gene expression data
Cataloged from PDF version of article.In order to benefit maximally from large scale molecular biology data generated
by recent developments, it is important to proceed in an organized manner
by developing databases, interfaces, data visualization and data interpretation
tools. Protein subcellular localization and microarray gene expression are two
of such fields that require immense computational effort before being used as
a roadmap for the experimental biologist. Protein subcellular localization is important
for elucidating protein function. We developed an automatically updated
searchable and downloadable system called model organisms proteome subcellular
localization database (MEP2SL) that hosts predicted localizations and known
experimental localizations for nine eukaryotes. MEP2SL localizations highly correlated
with high throughput localization experiments in yeast and were shown
to have superior accuracies when compared with four other localization prediction
tools based on two different datasets. Hence, MEP2SL system may serve as
a reference source for protein subcellular localization information with its interface
that provides various search and download options together with links and
utilities for further annotations. Microarray gene expression technology enables
monitoring of whole genome simultaneously. We developed an online installable
searchable open source system called differentially expressed genes (DEG) that
includes analysis and retrieval interfaces for Affymetrix HG-U133 Plus 2.0 arrays.
DEG provides permanent data storage capabilities with its integration into
a database and being an installable online tool and is valuable for groups who
are not willing to submit their data on public servers.Bilen, BiterM.S
- …