3,770 research outputs found

    Predicting the outer membrane proteome of Pasteurella multocida based on consensus prediction enhanced by results integration and manual confirmation

    Get PDF
    Background Outer membrane proteins (OMPs) of Pasteurella multocida have various functions related to virulence and pathogenesis and represent important targets for vaccine development. Various bioinformatic algorithms can predict outer membrane localization and discriminate OMPs by structure or function. The designation of a confident prediction framework by integrating different predictors followed by consensus prediction, results integration and manual confirmation will improve the prediction of the outer membrane proteome. Results In the present study, we used 10 different predictors classified into three groups (subcellular localization, transmembrane β-barrel protein and lipoprotein predictors) to identify putative OMPs from two available P. multocida genomes: those of avian strain Pm70 and porcine non-toxigenic strain 3480. Predicted proteins in each group were filtered by optimized criteria for consensus prediction: at least two positive predictions for the subcellular localization predictors, three for the transmembrane β-barrel protein predictors and one for the lipoprotein predictors. The consensus predicted proteins were integrated from each group into a single list of proteins. We further incorporated a manual confirmation step including a public database search against PubMed and sequence analyses, e.g. sequence and structural homology, conserved motifs/domains, functional prediction, and protein-protein interactions to enhance the confidence of prediction. As a result, we were able to confidently predict 98 putative OMPs from the avian strain genome and 107 OMPs from the porcine strain genome with 83% overlap between the two genomes. Conclusions The bioinformatic framework developed in this study has increased the number of putative OMPs identified in P. multocida and allowed these OMPs to be identified with a higher degree of confidence. Our approach can be applied to investigate the outer membrane proteomes of other Gram-negative bacteria

    A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine

    Get PDF
    AbstractApoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. Based on the idea of coarse-grained description and grouping in physics, a new feature extraction method with grouped weight for protein sequence is presented, and applied to apoptosis protein subcellular localization prediction associated with support vector machine. For the same training dataset and the same predictive algorithm, the overall prediction accuracy of our method in Jackknife test is 13.2% and 15.3% higher than the accuracy based on the amino acid composition and instability index. Especially for the else class apoptosis proteins, the increment of prediction accuracy is 41.7 and 33.3 percentile, respectively. The experiment results show that the new feature extraction method is efficient to extract the structure information implicated in protein sequence and the method has reached a satisfied performance despite its simplicity. The overall prediction accuracy of EBGW_SVM model on dataset ZD98 reach 92.9% in Jackknife test, which is 8.2–20.4 percentile higher than other existing models. For a new dataset ZW225, the overall prediction accuracy of EBGW_SVM achieves 83.1%. Those implied that EBGW_SVM model is a simple but efficient prediction model for apoptosis protein subcellular location prediction

    Minimalist Ensemble Algorithms for Genome-Wide Protein Localization Prediction

    Get PDF
    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi

    Multiple marker abundance profiling:combining selected reaction monitoring and data-dependent acquisition for rapid estimation of organelle abundance in subcellular samples

    Get PDF
    Measuring changes in protein or organelle abundance in the cell is an essential, but challenging aspect of cell biology. Frequently-used methods for determining organelle abundance typically rely on detection of a very few marker proteins, so are unsatisfactory. In silico estimates of protein abundances from publicly available protein spectra can provide useful standard abundance values but contain only data from tissue proteomes, and are not coupled to organelle localization data. A new protein abundance score, the normalized protein abundance scale (NPAS), expands on the number of scored proteins and the scoring accuracy of lower-abundance proteins in Arabidopsis. NPAS was combined with subcellular protein localization data, facilitating quantitative estimations of organelle abundance during routine experimental procedures. A suite of targeted proteomics markers for subcellular compartment markers was developed, enabling independent verification of in silico estimates for relative organelle abundance. Estimation of relative organelle abundance was found to be reproducible and consistent over a range of tissues and growth conditions. In silico abundance estimations and localization data have been combined into an online tool, multiple marker abundance profiling, available in the SUBA4 toolbox (http://suba.live)

    Protein Domain Boundary Predictions: A Structural Biology Perspective

    Get PDF
    One of the important fields to apply computational tools for domain boundaries prediction is structural biology. They can be used to design protein constructs that must be expressed in a stable and functional form and must produce diffraction-quality crystals. However, prediction of protein domain boundaries on the basis of amino acid sequences is still very problematical. In present study the performance of several computational approaches are compared. It is observed that the statistical significance of most of the predictions is rather poor. Nevertheless, when the right number of domains is correctly predicted, domain boundaries are predicted within very few residues from their real location. It can be concluded that prediction methods cannot be used yet as routine tools in structural biology, though some of them are rather promising

    Predicting the targeting of tail-anchored proteins to subcellular compartments in mammalian cells

    Get PDF
    This is the author accepted manuscript. The final version is available from Company of Biologists via the DOI in this record.Tail-anchored (TA) proteins contain a single transmembrane domain (TMD) at the Cterminus, anchoring them to organelle membranes where they mediate a variety of critical cellular processes. Mutations in individual TA proteins cause a number of severe inherited disorders. However, the molecular mechanisms and signals facilitating proper TA protein targeting are not fully understood, in particular in mammals. Here, we identify additional TA proteins at peroxisomes or shared by multiple organelles in mammals and reveal that a combination of TMD hydrophobicity and tail charge determines targeting to distinct organelles. Specifically, an increase in tail charge can override a hydrophobic TMD signal and re-direct a protein from the ER to peroxisomes or mitochondria and vice versa. We demonstrate that subtle alterations in those physicochemical parameters can shift TA protein targeting between organelles, explaining why peroxisomes and mitochondria share many TA proteins. Our analyses enabled us to allocate characteristic physicochemical parameters to different organelle groups. This classification allows for the first time, successful prediction of the location of uncharacterized TA proteins.We thank colleagues who provided materials (see Tables S1-S4) and acknowledge support from A. C. Magalhães, M. Almeida, D. Tuerker, S. Kuehl and C. Davies. This work was supported by the Biotechnology and Biological Sciences Research Council (BB/K006231/1 to M.S.), a Wellcome Trust Institutional Strategic Support Award (WT097835MF, WT105618MA to M.S.), the Portuguese Foundation for Science and Technology and FEDER/COMPETE (PTDC/BIA-BCM/118605/2010 to M.S.; SFRH/BD/37647/2007 to N.B.; SFRH/BPD/77619/2011 and UID/BIM/04501/2013 to D.R.). M.W., E.A.G., and M.S. are supported by Marie Curie Initial Training Network (ITN) action PerFuMe (316723)

    The role of type 4 phosphodiesterases in generating microdomains of cAMP: Large scale stochastic simulations

    Get PDF
    Cyclic AMP (cAMP) and its main effector Protein Kinase A (PKA) are critical for several aspects of neuronal function including synaptic plasticity. Specificity of synaptic plasticity requires that cAMP activates PKA in a highly localized manner despite the speed with which cAMP diffuses. Two mechanisms have been proposed to produce localized elevations in cAMP, known as microdomains: impeded diffusion, and high phosphodiesterase (PDE) activity. This paper investigates the mechanism of localized cAMP signaling using a computational model of the biochemical network in the HEK293 cell, which is a subset of pathways involved in PKA-dependent synaptic plasticity. This biochemical network includes cAMP production, PKA activation, and cAMP degradation by PDE activity. The model is implemented in NeuroRD: novel, computationally efficient, stochastic reaction-diffusion software, and is constrained by intracellular cAMP dynamics that were determined experimentally by real-time imaging using an Epac-based FRET sensor (H30). The model reproduces the high concentration cAMP microdomain in the submembrane region, distinct from the lower concentration of cAMP in the cytosol. Simulations further demonstrate that generation of the cAMP microdomain requires a pool of PDE4D anchored in the cytosol and also requires PKA-mediated phosphorylation of PDE4D which increases its activity. The microdomain does not require impeded diffusion of cAMP, confirming that barriers are not required for microdomains. The simulations reported here further demonstrate the utility of the new stochastic reaction-diffusion algorithm for exploring signaling pathways in spatially complex structures such as neurons
    corecore