4 research outputs found

    NestedMICA as an ab initio protein motif discovery tool.

    Get PDF
    BACKGROUND: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length. RESULTS: Generally NestedMICA recovered most of the short (3-9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME. CONCLUSION: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences. AVAILABILITY: NestedMICA is available under the Lesser GPL open-source license from: http://www.sanger.ac.uk/Software/analysis/nmica/RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Computational prediction of protein subcellular localization and function

    Get PDF
    In this study, we present a computational approach in which it is possible to directly predict the protein functional categories from sequence and to identify the protein subcellular localization, which, in turn, is helpful for functional classification. Subcellular protein locations and functions have been predicted basically from amino acid composition by using a machine learning approach. Expert systems based on Support Vector Machines have been designed to predict subcellular locations for proteins both in plants and nonplants, and function particularly for nonplants. Four subcellular localization categories for plant and nonplant proteins have beenidentified by correct prediction accuracies of 95.4%, and 99.7% respectively. In addition to the three common categories mitochondrial, extracellular / secretory, and nuclear; the classes cytosolic for nonplants, and, chloroplast for plants are included. Functional categories related to the subcellular compartments are predicted by using a similar approach applied for localization prediction. 92.9% of the 2321 protein sequences have been correctly assigned into the selected 10 functional categories. Finally, the contribution of the data-mining of the MEDLINE papers to the function prediction is tested by another protein data set
    corecore