1,659 research outputs found

    An SVM-based system for predicting protein subnuclear localizations

    Get PDF
    BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key. RESULTS: New kernel functions used in a support vector machine (SVM) learning model are introduced for the measurement of sequence similarity. The k-peptide vectors are first mapped by a matrix of high-scored pairs of k-peptides which are measured by BLOSUM62 scores. The kernels, measuring the similarity for sequences, are then defined on the mapped vectors. By combining these new encoding methods, a multi-class classification system for the prediction of protein subnuclear localizations is established for the first time. The performance of the system is evaluated with a set of proteins collected in the Nuclear Protein Database (NPD). The overall accuracy of prediction for 6 localizations is about 50% (vs. random prediction 16.7%) for single localization proteins in the leave-one-out cross-validation; and 65% for an independent set of multi-localization proteins. This integrated system can be accessed at . CONCLUSION: The integrated system benefits from the combination of predictions from several SVMs based on selected encoding methods. Finally, the predictive power of the system is expected to improve as more proteins with known subnuclear localizations become available

    Amino acid classification based spectrum kernel fusion for protein subnuclear localization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization. There are only three computational models for protein subnuclear localization thus far, to the best of our knowledge. Two models were based on protein primary sequence only. The first model assumed homogeneous amino acid substitution pattern across all protein sequence residue sites and used BLOSUM62 to encode <it>k</it>-mer of protein sequence. Ensemble of SVM based on different <it>k</it>-mers drew the final conclusion, achieving 50% overall accuracy. The simplified assumption did not exploit protein sequence profile and ignored the fact of heterogeneous amino acid substitution patterns across sites. The second model derived the <it>PsePSSM </it>feature representation from protein sequence by simply averaging the profile PSSM and combined the <it>PseAA </it>feature representation to construct a kNN ensemble classifier <it>Nuc-PLoc</it>, achieving 67.4% overall accuracy. The two models based on protein primary sequence only both achieved relatively poor predictive performance. The third model required that GO annotations be available, thus restricting the model's applicability.</p> <p>Methods</p> <p>In this paper, we only use the amino acid information of protein sequence without any other information to design a widely-applicable model for protein subnuclear localization. We use <it>K</it>-spectrum kernel to exploit the contextual information around an amino acid and the conserved motif information. Besides expanding window size, we adopt various amino acid classification approaches to capture diverse aspects of amino acid physiochemical properties. Each amino acid classification generates a series of spectrum kernels based on different window size. Thus, (I) window expansion can capture more contextual information and cover size-varying motifs; (II) various amino acid classifications can exploit multi-aspect biological information from the protein sequence. Finally, we combine all the spectrum kernels by simple addition into one single kernel called <it>SpectrumKernel+ </it>for protein subnuclear localization.</p> <p>Results</p> <p>We conduct the performance evaluation experiments on two benchmark datasets: <it>Lei </it>and <it>Nuc-PLoc</it>. Experimental results show that <it>SpectrumKernel+ </it>achieves substantial performance improvement against the previous model <it>Nuc-PLoc</it>, with overall accuracy <it>83.47% </it>against <it>67.4%</it>; and <it>71.23% </it>against <it>50% </it>of <it>Lei SVM Ensemble</it>, against 66.50% of <it>Lei GO SVM Ensemble</it>.</p> <p>Conclusion</p> <p>The method <it>SpectrumKernel</it>+ can exploit rich amino acid information of protein sequence by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches. The kernels derived from diverse amino acid classification approaches and different sizes of <it>k</it>-mer are summed together for data integration. Experiments show that the method <it>SpectrumKernel</it>+ significantly outperforms the existing models for protein subnuclear localization.</p

    Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction

    Get PDF
    BACKGROUND: The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. RESULTS: The performance of the new system proposed here was compared with our previous system using a set of proteins resided within 6 localizations collected from the Nuclear Protein Database (NPD). The overall MCC (accuracy) is elevated from 0.284 (50.0%) to 0.519 (66.5%) for single-localization proteins in leave-one-out cross-validation; and from 0.420 (65.2%) to 0.541 (65.2%) for an independent set of multi-localization proteins. The new system is available at . CONCLUSION: The prediction of protein subnuclear localizations can be largely influenced by various definitions of similarity for a pair of proteins based on different similarity measures of GO terms. Using the sum of similarity scores over the matched GO term pairs for two proteins as the similarity definition produced the best predictive outcome. Substantial improvement in predicting protein subnuclear localizations has been achieved by combining Gene Ontology with sequence information

    Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence

    Get PDF
    BACKGROUND: Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins. RESULTS: By using leave-one-out cross validation, the prediction accuracy is 85.5% for inner membrane, 94.5% for matrix and 51.2% for outer membrane. The overall prediction accuracy for submitochondria location prediction is 85.2%. For proteins predicted to localize at inner membrane, the accuracy is 94.6% for membrane protein type prediction. CONCLUSION: Our method is an effective method for predicting protein submitochondria location. But even with our method or the methods at subcellular level, the prediction of protein submitochondria location is still a challenging problem. The online service SubMito is now available at

    Search for the Nuclear Localization Signal of Ime4

    Get PDF
    Ime4 is the catalytic subunit of a conserved methyltransferase (MTase) complex found in yeast, S. cerevisiae. This complex is responsible for creating the RNA modification N6- methyladenosine (m6A), the most common post-transcriptional modification in higher eukaryotes. There is evidence to suggest that m6A is an important mediator of gene expression control within the cell and has been associated with a diverse array of phenotypic effects, notably as a conserved determinant of cell fate. The MTase complex is known to be a nuclear protein, the compartment where it is believed to carry out most of its methylation activity. Recently, the nuclear localization signals (NLS) of the subunits of the human MTase complex were experimentally identified, whereas the NLSs of the yeast MTase complex remain unknown. Here, we have experimentally identified the amino acid sequence 517RKYQEFMKSKTGTSHTGTKKIDKK540, located within the C-terminal region, as a putative bipartite NLS for Ime4

    Search for the Nuclear Localization Signal of Ime4

    Get PDF
    Ime4 is the catalytic subunit of a conserved methyltransferase (MTase) complex found in yeast, S. cerevisiae. This complex is responsible for creating the RNA modification N6- methyladenosine (m6A), the most common post-transcriptional modification in higher eukaryotes. There is evidence to suggest that m6A is an important mediator of gene expression control within the cell and has been associated with a diverse array of phenotypic effects, notably as a conserved determinant of cell fate. The MTase complex is known to be a nuclear protein, the compartment where it is believed to carry out most of its methylation activity. Recently, the nuclear localization signals (NLS) of the subunits of the human MTase complex were experimentally identified, whereas the NLSs of the yeast MTase complex remain unknown. Here, we have experimentally identified the amino acid sequence 517RKYQEFMKSKTGTSHTGTKKIDKK540, located within the C-terminal region, as a putative bipartite NLS for Ime4

    Phosphorylation of phytochrome B inhibits light-induced signaling via accelerated dark reversion in Arabidopsis

    Get PDF
    The photoreceptor phytochrome B (phyB) interconverts between the biologically active Pfr (lmax = 730 nm) and inactive Pr (lmax = 660 nm) forms in a red/far-red–dependent fashion and regulates, as molecular switch, many aspects of lightdependent development in Arabidopsis thaliana. phyB signaling is launched by the biologically active Pfr conformer and mediated by specific protein–protein interactions between phyB Pfr and its downstream regulatory partners, whereas conversion of Pfr to Pr terminates signaling. Here, we provide evidence that phyB is phosphorylated in planta at Ser-86 located in the N-terminal domain of the photoreceptor. Analysis of phyB-9 transgenic plants expressing phospho-mimic and nonphosphorylatable phyB–yellow fluorescent protein (YFP) fusions demonstrated that phosphorylation of Ser-86 negatively regulates all physiological responses tested. The Ser86Asp and Ser86Ala substitutions do not affect stability, photoconversion, and spectral properties of the photoreceptor, but light-independent relaxation of the phyBSer86Asp Pfr into Pr, also termed dark reversion, is strongly enhanced both in vivo and in vitro. Faster dark reversion attenuates red light–induced nuclear import and interaction of phyBSer86Asp-YFP Pfr with the negative regulator PHYTOCHROME INTERACTING FACTOR3 compared with phyB–green fluorescent protein. These data suggest that accelerated inactivation of the photoreceptor phyB via phosphorylation of Ser-86 represents a new paradigm for modulating phytochrome-controlled signaling

    Expression and Functional Studies on the Noncoding RNA, PRINS.

    Get PDF
    PRINS, a noncoding RNA identified earlier by our research group, contributes to psoriasis susceptibility and cellular stress response. We have now studied the cellular and histological distribution of PRINS by using in situ hybridization and demonstrated variable expressions in different human tissues and a consistent staining pattern in epidermal keratinocytes and in vitro cultured keratinocytes. To identify the cellular function(s) of PRINS, we searched for a direct interacting partner(s) of this stress-induced molecule. In HaCaT and NHEK cell lysates, the protein proved to be nucleophosmin (NPM) protein as a potential physical interactor with PRINS. Immunohistochemical experiments revealed an elevated expression of NPM in the dividing cells of the basal layers of psoriatic involved skin samples as compared with healthy and psoriatic uninvolved samples. Others have previously shown that NPM is a ubiquitously expressed nucleolar phosphoprotein which shuttles to the nucleoplasm after UV-B irradiation in fibroblasts and cancer cells. We detected a similar translocation of NPM in UV-B-irradiated cultured keratinocytes. The gene-specific silencing of PRINS resulted in the retention of NPM in the nucleolus of UV-B-irradiated keratinocytes; suggesting that PRINS may play a role in the NPM-mediated cellular stress response in the skin
    • 

    corecore