495 research outputs found

    Classification of nuclear receptors based on amino acid composition and dipeptide composition

    Get PDF
    Nuclear receptors are key transcription factors that regulate crucial gene networks responsible for cell growth, differentiation, and homeostasis. Nuclear receptors form a superfamily of phylogenetically related proteins and control functions associated with major diseases (e.g. diabetes, osteoporosis, and cancer). In this study, a novel method has been developed for classifying the subfamilies of nuclear receptors. The classification was achieved on the basis of amino acid and dipeptide composition from a sequence of receptors using support vector machines. The training and testing was done on a non-redundant data set of 282 proteins obtained from the NucleaRDB data base (1). The performance of all classifiers was evaluated using a 5-fold cross validation test. In the 5-fold cross-validation, the data set was randomly partitioned into five equal sets and evaluated five times on each distinct set while keeping the remaining four sets for training. It was found that different subfamilies of nuclear receptors were quite closely correlated in terms of amino acid composition as well as dipeptide composition. The overall accuracy of amino acid composition-based and dipeptide compositionbased classifiers were 82.6 and 97.5%, respectively. Therefore, our results prove that different subfamilies of nuclear receptors are predictable with considerable accuracy using amino acid or dipeptide composition. Furthermore, based on above approach, an online web service, NRpred, was developed, which is available at www.imtech.res.in/raghava/nrpred

    Prediction of neurotoxins based on their function and source

    Get PDF
    We have developed a method NTXpred for predicting neurotoxins and classifying them based on their function and origin. The dataset used in this study consists of 582 non-redundant, experimentally annotated neurotoxins obtained from Swiss-Prot. A number of modules have been developed for predicting neurotoxins using residue composition based on feed-forwarded neural network (FNN), recurrent neural network (RNN), support vector machine (SVM) and achieved maximum accuracy of 84.19%, 92.75%, 97.72% respectively. In addition, SVM modules have been developed for classifying neurotoxins based on their source (e.g., eubacteria, cnidarians, molluscs, arthropods have been and chordate) using amino acid composition and dipeptide composition and achieved maximum overall accuracy of 78.94% and 88.07% respectively. The overall accuracy increased to 92.10%, when the evolutionary information obtained from PSI-BLAST was combined with SVM module of source classification. We have also developed SVM modules for classifying neurotoxins based on functions using amino acid, dipeptide composition and achieved overall accuracy of 83.11%, 91.10% respectively. The overall accuracy of function classification improved to 95.11%, when PSI-BLAST output was combined with SVM module. All the modules developed in this study were evaluated using five-fold cross-validation technique. The NTXpred is available at www.imtech.res.in/raghava/ntxpred/ and mirror site at http://bioinformatics.uams.edu/mirror/ntxpred

    BTXpred: prediction of bacterial toxins

    Get PDF
    This paper describes a method developed for predicting bacterial toxins from their amino acid sequences. All the modules, developed in this study, were trained and tested on a non-redundant dataset of 150 bacterial toxins that included 77 exotoxins and 73 endotoxins. Firstly, support vector machines (SVM) based modules were developed for predicting the bacterial toxins using amino acids and dipeptides composition and achieved an accuracy of 96.07% and 92.50%, respectively. Secondly, SVM based modules were developed for discriminating entotoxins and exotoxins, using amino acids and dipeptides composition and achieved an accuracy of 95.71% and 92.86%, respectively. In addition, modules have been developed for classifying the exotoxins (e.g. activate adenylate cyclase, activate guanylate cyclase, neurotoxins) using hidden Markov models (HMM), PSI-BLAST and a combination of the two and achieved overall accuracy of 95.75%, 97.87% and 100%, respectively. Based on the above study, a web server called 'BTXpred' has been developed, which is available at http://www.imtech.res.in/raghava/btxpred/. Supplementary information is available at http://www.imtech.res.in/raghava/btxpred/supplementary.html

    PRRDB: A comprehensive database of Pattern-Recognition Receptors and their ligands

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recently in a number of studies, it has been demonstrated that the innate immune system doesn't merely acts as the first line of defense but provides critical signals for the development of specific adaptive immune response. Innate immune system employs a set of receptors called pattern recognition receptors (PRRs) that recognize evolutionarily conserved patterns from pathogens called pathogen associated molecular patterns (PAMPs). In order to assist scientific community, a database PRRDB has been developed that provides extensive information about pattern recognition receptors and their ligands.</p> <p>Results</p> <p>The current version of database contains around 500 patterns recognizing receptors from 77 distinct organisms ranging from insects to human. This includes 177 Toll-like receptors, 124 are Scavenger receptors and 67 are Nucleotide Binding Site-Leucine repeats rich receptors. The database also provides information about 266 ligands that includes carbohydrates, proteins, nucleic acids, glycolipids, glycoproteins, lipopeptides. A number of web tools have been integrated in PRRDB in order to provide following services: i) searching on any field; ii) database browsing; and iii) BLAST search against the pattern-recognition receptors. PRRDB also provides external links to standard databases like Swiss-Prot and Pubmed.</p> <p>Conclusion</p> <p>PRRDB is a unique database of its kind, which provides comprehensive information about innate immunity. This database will be very useful in designing effective adjuvant for subunit vaccine and in understanding role of innate immunity. The database is available from the URL's in the Availabiltiy and requirements section.</p

    RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information

    Get PDF
    The attainment of complete map-based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome-scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)-based modules have been developed using traditional amino acid-, dipeptide- (i+1) and four parts-amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search-based module has been developed using position-specific iterated-basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher-order dipeptide composition, N- and C-terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf-PSORT, PA-SUB, Plant-Ploc and ESLpred. To assist the plant research community, an online web tool 'RSLpred' has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred

    A machine learning based method for the prediction of secretory proteins using amino acid composition,their order and similarity-search

    Get PDF
    Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the pre-protein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/

    Designing of highly effective complementary and mismatch siRNAs for silencing a gene

    Get PDF
    In past, numerous methods have been developed for predicting efficacy of short interfering RNA (siRNA). However these methods have been developed for predicting efficacy of fully complementary siRNA against a gene. Best of author's knowledge no method has been developed for predicting efficacy of mismatch siRNA against a gene. In this study, a systematic attempt has been made to identify highly effective complementary as well as mismatch siRNAs for silencing a gene. Support vector machine (SVM) based models have been developed for predicting efficacy of siRNAs using composition, binary and hybrid pattern siRNAs. We achieved maximum correlation 0.67 between predicted and actual efficacy of siRNAs using hybrid model. All models were trained and tested on a dataset of 2182 siRNAs and performance was evaluated using five-fold cross validation techniques. The performance of our method desiRm is comparable to other well-known methods. In this study, first time attempt has been made to design mutant siRNAs (mismatch siRNAs). In this approach we mutated a given siRNA on all possible sites/positions with all possible nucleotides. Efficacy of each mutated siRNA is predicted using our method desiRm. It is well known from literature that mismatches between siRNA and target affects the silencing efficacy. Thus we have incorporated the rules derived from base mismatches experimental data to find out over all efficacy of mutated or mismatch siRNAs. Finally we developed a webserver, desiRm (http://www.imtech.res.in/raghava/desirm/) for designing highly effective siRNA for silencing a gene. This tool will be helpful to design siRNA to degrade disease isoform of heterozygous single nucleotide polymorphism gene without depleting the wild type protein
    • …
    corecore