308 research outputs found

    Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences

    Get PDF
    This manuscript describes a support vector machine based method for the prediction of constitutive as well as immunoproteasome cleavage sites in antigenic sequences. This method achieved Matthew's correlation coefficents of 0.54 and 0.43 on in vitro and major histocompatibility complex ligand data, respectively. This shows that the performance of our method is comparable to that of the NetChop method, which is currently considered to be the best method for proteasome cleavage site prediction. Based on the method, a web server, Pcleavage, has also been developed. This server accepts protein sequences in any standard format and present results in a user-friendly format. The server is available for free use by all academic users at the URL or

    SVM based method for predicting HLA-DRB1<SUP>&#8727;</SUP>0401 binding peptides in an antigen sequence

    Get PDF
    Summary: Prediction of peptides binding with MHC class II allele HLA-DRB10401 can effectively reduce the number of experiments required for identifying helper T cell epitopes.This paper describes support vector machine (SVM) based method developed for identifying HLA-DRB1&#8727;0401 binding peptides in an antigenic sequence. SVM was trained and tested on large and clean data set consisting of 567 binders and equal number of non-binders. The accuracy of the method was 86% when evaluated through 5-fold cross-validation technique. Available: A web server HLA-DR4Pred based on above approach is available at http://www.imtech.res.in/raghava/ hladr4pred/ and http://bioinformatics.uams.edu/mirror/ ladr4pred/ (Mirror Site)

    Classification of nuclear receptors based on amino acid composition and dipeptide composition

    Get PDF
    Nuclear receptors are key transcription factors that regulate crucial gene networks responsible for cell growth, differentiation, and homeostasis. Nuclear receptors form a superfamily of phylogenetically related proteins and control functions associated with major diseases (e.g. diabetes, osteoporosis, and cancer). In this study, a novel method has been developed for classifying the subfamilies of nuclear receptors. The classification was achieved on the basis of amino acid and dipeptide composition from a sequence of receptors using support vector machines. The training and testing was done on a non-redundant data set of 282 proteins obtained from the NucleaRDB data base (1). The performance of all classifiers was evaluated using a 5-fold cross validation test. In the 5-fold cross-validation, the data set was randomly partitioned into five equal sets and evaluated five times on each distinct set while keeping the remaining four sets for training. It was found that different subfamilies of nuclear receptors were quite closely correlated in terms of amino acid composition as well as dipeptide composition. The overall accuracy of amino acid composition-based and dipeptide compositionbased classifiers were 82.6 and 97.5%, respectively. Therefore, our results prove that different subfamilies of nuclear receptors are predictable with considerable accuracy using amino acid or dipeptide composition. Furthermore, based on above approach, an online web service, NRpred, was developed, which is available at www.imtech.res.in/raghava/nrpred

    GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors

    Get PDF
    The receptors of amine subfamily are specifically major drug targets for therapy of nervous disorders and psychiatric diseases. The recognition of novel amine type of receptors and their cognate ligands is of paramount interest for pharmaceutical companies. In the past, Chou and co-workers have shown that different types of amine receptors are correlated with their amino acid composition and are predictable on its basis with considerable accuracy [Elrod and Chou (2002) Protein Eng., 15, 713–715]. This motivated us to develop a better method for the recognition of novel amine receptors and for their further classification. The method was developed on the basis of amino acid composition and dipeptide composition of proteins using support vector machine. The method was trained and tested on 167 proteins of amine subfamily of G-protein-coupled receptors (GPCRs). The method discriminated amine subfamily of GPCRs from globular proteins with Matthew's correlation coefficient of 0.98 and 0.99 using amino acid composition and dipeptide composition, respectively. In classifying different types of amine receptors using amino acid composition and dipeptide composition, the method achieved an accuracy of 89.8 and 96.4%, respectively. The performance of the method was evaluated using 5-fold cross-validation. The dipeptide composition based method predicted 67.6% of protein sequences with an accuracy of 100% with a reliability index β‰₯5. A web server GPCRsclass has been developed for predicting amine-binding receptors from its amino acid sequence [ and (mirror site)]

    A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes

    Get PDF
    In the present study, a systematic attempt has been made to develop an accurate method for predicting MHC class I restricted T cell epitopes for a large number of MHC class I alleles. Initially, a quantitative matrix (QM)-based method was developed for 47 MHC class I alleles having at least 15 binders. A secondary artificial neural network (ANN)-based method was developed for 30 out of 47 MHC alleles having a minimum of 40 binders. Combination of these ANN-and QM-based prediction methods for 30 alleles improved the accuracy of prediction by 6% compared to each individual method. Average accuracy of hybrid method for 30 MHC alleles is 92.8%. This method also allows prediction of binders for 20 additional alleles using QM that has been reported in the literature, thus allowing prediction for 67 MHC class I alleles. The performance of the method was evaluated using jack-knife validation test. The performance of the methods was also evaluated on blind or independent data. Comparison of our method with existing MHC binder prediction methods for alleles studied by both methods shows that our method is superior to other existing methods. This method also identifies proteasomal cleavage sites in antigen sequences by implementing the matrices described earlier. Thus, the method that we discover allows the identification of MHC class I binders (peptides binding with many MHC alleles) having proteasomal cleavage site at C-terminus. The user-friendly result display format (HTML-II) can assist in locating the promiscuous MHC binding regions from antigen sequence. The method is available on the web at www.imtech.res.in/raghava/nhlapred and its mirror site is available at http://bioinformatics.uams.edu/mirror/nhlapred/

    Prediction of CTL epitopes using QM, SVM and ANN techniques

    Get PDF
    Cytotoxic T lymphocyte (CTL) epitopes are potential candidates for subunit vaccine design for various diseases. Most of the existing T cell epitope prediction methods are indirect methods that predict MHC class I binders instead of CTL epitopes. In this study, a systematic attempt has been made to develop a direct method for predicting CTL epitopes from an antigenic sequence. This method is based on quantitative matrix (QM) and machine learning techniques such as Support Vector Machine (SVM) and Artificial Neural Network (ANN). This method has been trained and tested on non-redundant dataset of T cell epitopes and non-epitopes that includes 1137 experimentally proven MHC class I restricted T cell epitopes. The accuracy of QM-, ANN- and SVM-based methods was 70.0, 72.2 and 75.2%, respectively. The performance of these methods has been evaluated through Leave One Out Cross-Validation (LOOCV) at a cutoff score where sensitivity and specificity was nearly equal. Finally, both machine-learning methods were used for consensus and combined prediction of CTL epitopes. The performances of these methods were evaluated on blind dataset where machine learning-based methods perform better than QM-based method. We also demonstrated through subgroup analysis that our methods can discriminate between T-cell epitopes and MHC binders (non-epitopes). In brief this method allows prediction of CTL epitopes using QM, SVM, ANN approaches. The method also facilitates prediction of MHC restriction in predicted T cell epitopes. The method is available at http://www.imtech.res.in/raghava/ctlpred/

    MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many databases housing the information about MHC binders and non-binders have been developed in the past to help the scientific community working in the field of immunology, immune-informatics or vaccine design. As the information about these MHC binding and non-binding peptides continues to grow with the time and there is a need to keep the databases updated. So, in order to provide the immunological fraternity with the most recent information we need to maintain and update our database regularly. In this paper, we describe the updated version of 4.0 of the database MHCBN.</p> <p>Findings</p> <p>MHCBN is a comprehensive database comprising over 25,857 peptide sequences (1053 TAP binding peptides), whose binding affinity with either MHC or TAP molecules has been assayed experimentally. It is a manually curated database where entries are collected & compiled from published literature and existing immunological public databases. MHCBN has a number of web-based tools for the analysis and retrieval of information like mapping of antigenic regions, creation of allele specific dataset, BLAST search, various diseases associated with MHC alleles etc. Further, all entries are hyper linked to major databases like SWISS-PROT, PDB etc. to provide the information beyond the scope of MHCBN. The latest version 4.0 of MHCBN has 6080 more entries than previously published version 1.1.</p> <p>Conclusion</p> <p>MHCBN database updating is meant to facilitate immunologist in understanding the immune system and provide them the latest information. We feel that our database will complement the existing databases in serving scientific community.</p

    Bcipep: A database of B-cell epitopes

    Get PDF
    BACKGROUND: Bcipep is a database of experimentally determined linear B-cell epitopes of varying immunogenicity collected from literature and other publicly available databases. RESULTS: The current version of Bcipep database contains 3031 entries that include 763 immunodominant, 1797 immunogenic and 471 null-immunogenic epitopes. It covers a wide range of pathogenic organisms like viruses, bacteria, protozoa, and fungi. The database provides a set of tools for the analysis and extraction of data that includes keyword search, peptide mapping and BLAST search. It also provides hyperlinks to various databases such as GenBank, PDB, SWISS-PROT and MHCBN. CONCLUSION: A comprehensive database of B-cell epitopes called Bcipep has been developed that covers information on epitopes from a wide range of pathogens. The Bcipep will be source of information for investigators involved in peptide-based vaccine design, disease diagnosis and research in allergy. It should also be a promising data source for the development and evaluation of methods for prediction of B-cell epitopes. The database is available at

    Recognition and classification of histones using support vector machine

    Get PDF
    Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles

    MHCBN: a comprehensive database of MHC binding and non-binding peptides

    Get PDF
    MHCBN is a comprehensive database of Major Histocompatibility Complex (MHC) binding and non-binding peptides compiled from published literature and existing databases. The latest version of the database has 19 777 entries including 17 129 MHC binders and 2648 MHC non-binders for more than 400 MHC molecules. The database has sequence and structure data of (a) source proteins of peptides and (b) MHC molecules. MHCBN has a number of web tools that include: (i) mapping of peptide on query sequence; (ii) search on any field; (iii) creation of data sets; and (iv) online data submission. The database also provides hypertext links to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, GenBank and PUBMED. Availability: MHCBN is available at http://www.imtech.res.in/raghava/mhcbn/. It’s SRS version is available from http://srs.ebi.ac.uk/
    • …
    corecore