7,763 research outputs found

    Protein subcellular localization prediction of eukaryotes using a knowledge-based approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles.</p> <p>Results</p> <p>In this study, we propose a knowledge based method, called KnowPred<sub>site</sub>, to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred<sub>site</sub>'s performance. The experiment results show that KnowPred<sub>site </sub>achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred<sub>site </sub>is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred<sub>site </sub>is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred<sub>site</sub>.</p> <p>Conclusion</p> <p>KnowPred<sub>site </sub>demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred<sub>site </sub>is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred<sub>site </sub>is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred<sub>site </sub>prediction server is available at <url>http://bio-cluster.iis.sinica.edu.tw/kbloc/</url>.</p

    Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework

    Full text link
    Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of single- and multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without restricting predictions to be based only on location-combinations present in the training set.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Signal peptides and protein localization prediction

    Get PDF

    TESTLoc: protein subcellular localization prediction from EST data

    Get PDF
    Abstract Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html</p

    cis-acting sequences and trans-acting factors in the localization of mRNA for mitochondrial ribosomal proteins

    Get PDF
    mRNA localization is a conserved post-transcriptional process crucial for a variety of systems. Although several mechanisms have been identified, emerging evidence suggests that most transcripts reach the protein functional site by moving along cytoskeleton elements. We demonstrated previously that mRNA for mitochondrial ribosomal proteins are asymmetrically distributed in the cytoplasm, and that localization in the proximity of mitochondria is mediated by the 3′-UTR. Here we show by biochemical analysis that these mRNA transcripts are associated with the cytoskeleton through the microtubule network. Cytoskeleton association is functional for their intracellular localization near the mitochondrion, and the 3′-UTR is involved in this cytoskeleton-dependent localization. To identify the minimal elements required for localization, we generated DNA constructs containing, downstream from the GFP gene, deletion mutants of mitochondrial ribosomal protein S12 3′-UTR, and expressed them in HeLa cells. RT-PCR analysis showed that the localization signals responsible for mRNA localization are located in the first 154 nucleotides. RNA pulldown assays, mass spectrometry, and RNP immunoprecipitation assay experiments, demonstrated that mitochondrial ribosomal protein S12 3′-UTR interacts specifically with TRAP1 (tumor necrosis factor receptor-associated protein1), hnRNPM4 (heterogeneous nuclear ribonucleoprotein M4), Hsp70 and Hsp60 (heat shock proteins 70 and 60), and α-tubulin in vitro and in vivo

    eSLDB: eukaryotic subcellular localization database

    Get PDF
    Eukaryotic Subcellular Localization DataBase collects the annotations of subcellular localization of eukaryotic proteomes. So far five proteomes have been processed and stored: Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana. For each sequence, the database lists localization obtained adopting three different approaches: (i) experimentally determined (when available); (ii) homology-based (when possible); and (iii) predicted. The latter is computed with a suite of machine learning based methods, developed in house. All the data are available at our website and can be searched by sequence, by protein code and/or by protein description. Furthermore, a more complex search can be performed combining different search fields and keys. All the data contained in the database can be freely downloaded in flat file format. The database is available at

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase

    Get PDF
    BACKGROUND: Orthology is a central tenet of comparative genomics and ortholog identification is instrumental to protein function prediction. Major advances have been made to determine orthology relations among a set of homologous proteins. However, they depend on the comparison of individual sequences and do not take into account divergent orthologs. RESULTS: We have developed an iterative orthology prediction method, Ortho-Profile, that uses reciprocal best hits at the level of sequence profiles to infer orthology. It increases ortholog detection by 20% compared to sequence-to-sequence comparisons. Ortho-Profile predicts 598 human orthologs of mitochondrial proteins from Saccharomyces cerevisiae and Schizosaccharomyces pombe with 94% accuracy. Of these, 181 were not known to localize to mitochondria in mammals. Among the predictions of the Ortho-Profile method are 11 human cytochrome c oxidase (COX) assembly proteins that are implicated in mitochondrial function and disease. Their co-expression patterns, experimentally verified subcellular localization, and co-purification with human COX-associated proteins support these predictions. For the human gene C12orf62, the ortholog of S. cerevisiae COX14, we specifically confirm its role in negative regulation of the translation of cytochrome c oxidase. CONCLUSIONS: Divergent homologs can often only be detected by comparing sequence profiles and profile-based hidden Markov models. The Ortho-Profile method takes advantage of these techniques in the quest for orthologs

    Analyses and web interfaces for protein subcellular localization and gene expression data

    Get PDF
    Cataloged from PDF version of article.In order to benefit maximally from large scale molecular biology data generated by recent developments, it is important to proceed in an organized manner by developing databases, interfaces, data visualization and data interpretation tools. Protein subcellular localization and microarray gene expression are two of such fields that require immense computational effort before being used as a roadmap for the experimental biologist. Protein subcellular localization is important for elucidating protein function. We developed an automatically updated searchable and downloadable system called model organisms proteome subcellular localization database (MEP2SL) that hosts predicted localizations and known experimental localizations for nine eukaryotes. MEP2SL localizations highly correlated with high throughput localization experiments in yeast and were shown to have superior accuracies when compared with four other localization prediction tools based on two different datasets. Hence, MEP2SL system may serve as a reference source for protein subcellular localization information with its interface that provides various search and download options together with links and utilities for further annotations. Microarray gene expression technology enables monitoring of whole genome simultaneously. We developed an online installable searchable open source system called differentially expressed genes (DEG) that includes analysis and retrieval interfaces for Affymetrix HG-U133 Plus 2.0 arrays. DEG provides permanent data storage capabilities with its integration into a database and being an installable online tool and is valuable for groups who are not willing to submit their data on public servers.Bilen, BiterM.S
    corecore