501 research outputs found

    MetaMHC: a meta approach to predict peptides binding to MHC molecules

    Get PDF
    As antigenic peptides binding to major histocompatibility complex (MHC) molecules is the prerequisite of cellular immune responses, an accurate computational predictor will be of great benefit to biologists and immunologists for understanding the underlying mechanism of immune recognition as well as facilitating the process of epitope mapping and vaccine design. Although various computational approaches have been developed, recent experimental results on benchmark data sets show that the development of improved predictors is needed, especially for MHC Class II peptide binding. To make the most of current methods and achieve a higher predictive performance, we developed a new web server, MetaMHC, to integrate the outputs of leading predictors by several popular ensemble strategies. MetaMHC consists of two components: MetaMHCI and MetaMHCII for MHC Class I peptide and MHC Class II peptide binding predictions, respectively. Experimental results by both cross-validation and using an independent data set show that the ensemble approaches outperform individual predictors, being statistically significant. MetaMHC is freely available at http://www.biokdd.fudan.edu.cn/Service/MetaMHC.html

    Discovering sequence motifs in quantitative and qualitative pepetide data

    Get PDF

    A genetic approach for building different alphabets for peptide and protein classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using features extracted by different reduced alphabets. Each alphabet is constructed by a Genetic Algorithm whose objective function is the maximization of the area under the ROC-curve obtained in several classification problems.</p> <p>Results</p> <p>The new approach has been tested in three peptide classification problems: HIV-protease, recognition of T-cell epitopes and prediction of peptides that bind human leukocyte antigens. The tests demonstrate that the idea of training a pool classifiers by reduced alphabets, created using a Genetic Algorithm, allows an improvement over other state-of-the-art feature extraction methods.</p> <p>Conclusion</p> <p>The validity of the novel strategy for creating reduced alphabets is demonstrated by the performance improvement obtained by the proposed approach with respect to other reduced alphabets-based methods in the tested problems.</p


    Get PDF
    Human health, one of the major topics in Life Science, is facing intensified challenges, including cancer, pandemic outbreaks, and antimicrobial resistance. Thus, new medicines with unique advantages, including peptide-based vaccines and permeable small molecule antimicrobials, are in urgent need. However, the drug development process is long, complex, and risky with no guarantee of success. Also, the improvements in techniques applied in genomics, proteomics, computational biology, and clinical trials significantly increase the data complexity and volume, which imposes higher requirements on the drug development pipeline. In recent years, machine learning (ML) methods were employed to support drug development in various aspects and were shown to be highly effective. Here, we explored the application of advanced ML approaches to empower the development of peptide-based vaccines and permeable antimicrobials. First, the peptide-based vaccines targeting pancreatic cancer and COVID-19 were predicted and screened via multiple approaches. Next, novel structure-based methods to improve the performance of peptide: MHC binding affinity prediction were developed, including an HLA modeling pipeline that provides structures for docking-based peptide binder validation, and hierarchical clustering of HLA I into supertypes and subtypes that have similar peptide binding specificity. Finally, the physicochemical properties governing the permeability of small molecules into multidrug-resistant Pseudomonas aeruginosa cells were selected using a random forest model. In conclusion, the use of machine learning methods could accelerate the drug development process at a lower cost and promote data-based decision-making if used properly

    A novel ensemble fuzzy classification model in SARS-CoV-2 B-cell epitope identification for development of protein-based vaccine

    Get PDF
    B-cell epitope prediction research has received growing interest since the development of the first method. B-cell epitope identification with the aid of an accurate prediction method is one of the most important steps in epitope-based vaccine development, immunodiagnostic testing, antibody production, disease diagnosis, and treatment. Nevertheless, using experimental methods in epitope mapping is very time-consuming, costly, and labor-intensive. Therefore, although successful predictions with in silico methods are very important in epitope prediction, there are limited studies in this area. The aim of this study is to propose a new approach for successfully predicting B-cell epitopes for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In this study, the SARS-CoV B-cell epitope prediction performances of different fuzzy learning classification models genetic cooperative competitive learning (GCCL), fuzzy genetics-based machine learning (GBML), Chi's method (CHI), Ishibuchi's method with weight factor (W), structural learning algorithm on vague environment (SLAVE) and the state-of-the-art ensemble fuzzy classification model were compared. The obtained results showed that the proposed ensemble approach has the lowest error in SARS-CoV B-cell epitope estimation compared to the base fuzzy learners (average error rates; ensemble fuzzy=8.33, GCCL=30.42, GBML=23.82, CHI=29.17, W=46.25, and SLAVE=20.42). SARS-CoV and SARS-CoV-2 have high genome similarities. Therefore, the most successful method determined for SARS-CoV B-cell epitope prediction was used in SARS-CoV-2 cell epitope prediction. Finally, the eventual B-cell epitope prediction results obtained for SARS-CoV-2 with the ensemble fuzzy classification model were compared with the epitope sequences predicted by the BepiPred server and immunoinformatics studies in the literature for the same protein sequences according to VaxiJen 2.0 scores. We hope that the developed epitope prediction method will help design effective vaccines and drugs against future outbreaks of the coronavirus family, especially SARS-CoV-2 and its possible mutations. © 2021 Elsevier B.V.121E326This study was supported by The Scientific and Technological Research Council of Turkey-TÜBİTAK (Project Number: 121E326 ).This study was supported by The Scientific and Technological Research Council of Turkey-T?B?TAK (Project Number: 121E326)

    A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity

    Get PDF
    MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data

    Computer aided selection of candidate vaccine antigens

    Get PDF
    Immunoinformatics is an emergent branch of informatics science that long ago pullulated from the tree of knowledge that is bioinformatics. It is a discipline which applies informatic techniques to problems of the immune system. To a great extent, immunoinformatics is typified by epitope prediction methods. It has found disappointingly limited use in the design and discovery of new vaccines, which is an area where proper computational support is generally lacking. Most extant vaccines are not based around isolated epitopes but rather correspond to chemically-treated or attenuated whole pathogens or correspond to individual proteins extract from whole pathogens or correspond to complex carbohydrate. In this chapter we attempt to review what progress there has been in an as-yet-underexplored area of immunoinformatics: the computational discovery of whole protein antigens. The effective development of antigen prediction methods would significantly reduce the laboratory resource required to identify pathogenic proteins as candidate subunit vaccines. We begin our review by placing antigen prediction firmly into context, exploring the role of reverse vaccinology in the design and discovery of vaccines. We also highlight several competing yet ultimately complementary methodological approaches: sub-cellular location prediction, identifying antigens using sequence similarity, and the use of sophisticated statistical approaches for predicting the probability of antigen characteristics. We end by exploring how a systems immunomics approach to the prediction of immunogenicity would prove helpful in the prediction of antigens

    HLA Class II Specificity Assessed by High-Density Peptide Microarray Interactions

    Get PDF
    The ability to predict and/or identify MHC binding peptides is an essential component of T cell epitope discovery, something that ultimately should benefit the development of vaccines and immunotherapies. In particular, MHC class I prediction tools have matured to a point where accurate selection of optimal peptide epitopes is possible for virtually all MHC class I allotypes; in comparison, current MHC class II (MHC-II) predictors are less mature. Because MHC-II restricted CD4+ T cells control and orchestrated most immune responses, this shortcoming severely hampers the development of effective immunotherapies. The ability to generate large panels of peptides and subsequently large bodies of peptide-MHC-II interaction data are key to the solution of this problem, a solution that also will support the improvement of bioinformatics predictors, which critically relies on the availability of large amounts of accurate, diverse, and representative data. In this study, we have used rHLA-DRB1*01:01 and HLA-DRB1*03:01 molecules to interrogate high-density peptide arrays, in casu containing 70,000 random peptides in triplicates. We demonstrate that the binding data acquired contains systematic and interpretable information reflecting the specificity of the HLA-DR molecules investigated, suitable of training predictors able to predict T cell epitopes and peptides eluted from human EBV-transformed B cells. Collectively, with a cost per peptide reduced to a few cents, combined with the flexibility of rHLA technology, this poses an attractive strategy to generate vast bodies of MHC-II binding data at an unprecedented speed and for the benefit of generating peptide-MHC-II binding data as well as improving MHC-II prediction tools.Fil: Osterbye, Thomas. Universidad de Copenhagen; DinamarcaFil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Dudek, Nadine L.. Monash University; AustraliaFil: Ramarathinam, Sri H.. Monash University; AustraliaFil: Purcell, Anthony W.. Monash University; AustraliaFil: Schafer-Nielsen, Claus. No especifíca;Fil: Buus, Soren. University Of Copenhagen, Faculty Of Health Sciences

    NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence

    Get PDF
    Binding of peptides to Major Histocompatibility Complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC class I system (HLA-I) is extremely polymorphic. The number of registered HLA-I molecules has now surpassed 1500. Characterizing the specificity of each separately would be a major undertaking.Here, we have drawn on a large database of known peptide-HLA-I interactions to develop a bioinformatics method, which takes both peptide and HLA sequence information into account, and generates quantitative predictions of the affinity of any peptide-HLA-I interaction. Prospective experimental validation of peptides predicted to bind to previously untested HLA-I molecules, cross-validation, and retrospective prediction of known HIV immune epitopes and endogenous presented peptides, all successfully validate this method. We further demonstrate that the method can be applied to perform a clustering analysis of MHC specificities and suggest using this clustering to select particularly informative novel MHC molecules for future biochemical and functional analysis.Encompassing all HLA molecules, this high-throughput computational method lends itself to epitope searches that are not only genome- and pathogen-wide, but also HLA-wide. Thus, it offers a truly global analysis of immune responses supporting rational development of vaccines and immunotherapy. It also promises to provide new basic insights into HLA structure-function relationships. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan