10,415 research outputs found

    PepDist: A New Framework for Protein-Peptide Binding Prediction based on Learning Peptide Distance Functions

    Get PDF
    BACKGROUND: Many different aspects of cellular signalling, trafficking and targeting mechanisms are mediated by interactions between proteins and peptides. Representative examples are MHC-peptide complexes in the immune system. Developing computational methods for protein-peptide binding prediction is therefore an important task with applications to vaccine and drug design. METHODS: Previous learning approaches address the binding prediction problem using traditional margin based binary classifiers. In this paper we propose PepDist: a novel approach for predicting binding affinity. Our approach is based on learning peptide-peptide distance functions. Moreover, we suggest to learn a single peptide-peptide distance function over an entire family of proteins (e.g. MHC class I). This distance function can be used to compute the affinity of a novel peptide to any of the proteins in the given family. In order to learn these peptide-peptide distance functions, we formalize the problem as a semi-supervised learning problem with partial information in the form of equivalence constraints. Specifically, we propose to use DistBoost [1,2], which is a semi-supervised distance learning algorithm. RESULTS: We compare our method to various state-of-the-art binding prediction algorithms on MHC class I and MHC class II datasets. In almost all cases, our method outperforms all of its competitors. One of the major advantages of our novel approach is that it can also learn an affinity function over proteins for which only small amounts of labeled peptides exist. In these cases, our method's performance gain, when compared to other computational methods, is even more pronounced. We have recently uploaded the PepDist webserver which provides binding prediction of peptides to 35 different MHC class I alleles. The webserver which can be found at is powered by a prediction engine which was trained using the framework presented in this paper. CONCLUSION: The results obtained suggest that learning a single distance function over an entire family of proteins achieves higher prediction accuracy than learning a set of binary classifiers for each of the proteins separately. We also show the importance of obtaining information on experimentally determined non-binders. Learning with real non-binders generalizes better than learning with randomly generated peptides that are assumed to be non-binders. This suggests that information about non-binding peptides should also be published and made publicly available

    Towards Universal Structure-Based Prediction of Class II MHC Epitopes for Diverse Allotypes

    Get PDF
    The binding of peptide fragments of antigens to class II MHC proteins is a crucial step in initiating a helper T cell immune response. The discovery of these peptide epitopes is important for understanding the normal immune response and its misregulation in autoimmunity and allergies and also for vaccine design. In spite of their biomedical importance, the high diversity of class II MHC proteins combined with the large number of possible peptide sequences make comprehensive experimental determination of epitopes for all MHC allotypes infeasible. Computational methods can address this need by predicting epitopes for a particular MHC allotype. We present a structure-based method for predicting class II epitopes that combines molecular mechanics docking of a fully flexible peptide into the MHC binding cleft followed by binding affinity prediction using a machine learning classifier trained on interaction energy components calculated from the docking solution. Although the primary advantage of structure-based prediction methods over the commonly employed sequence-based methods is their applicability to essentially any MHC allotype, this has not yet been convincingly demonstrated. In order to test the transferability of the prediction method to different MHC proteins, we trained the scoring method on binding data for DRB1*0101 and used it to make predictions for multiple MHC allotypes with distinct peptide binding specificities including representatives from the other human class II MHC loci, HLA-DP and HLA-DQ, as well as for two murine allotypes. The results showed that the prediction method was able to achieve significant discrimination between epitope and non-epitope peptides for all MHC allotypes examined, based on AUC values in the range 0.632–0.821. We also discuss how accounting for peptide binding in multiple registers to class II MHC largely explains the systematically worse performance of prediction methods for class II MHC compared with those for class I MHC based on quantitative prediction performance estimates for peptide binding to class II MHC in a fixed register

    BIOINFORMATICS Learning MHC I-peptide binding

    Get PDF
    ABSTRACT Motivation and results: Motivated by the ability of a simple threading approach to predict MHC I-peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data (if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC types for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system

    Epitope prediction improved by multitask support vector machines

    Full text link
    Motivation: In silico methods for the prediction of antigenic peptides binding to MHC class I molecules play an increasingly important role in the identification of T-cell epitopes. Statistical and machine learning methods, in particular, are widely used to score candidate epitopes based on their similarity with known epitopes and non epitopes. The genes coding for the MHC molecules, however, are highly polymorphic, and statistical methods have difficulties to build models for alleles with few known epitopes. In this case, recent works have demonstrated the utility of leveraging information across alleles to improve the performance of the prediction. Results: We design a support vector machine algorithm that is able to learn epitope models for all alleles simultaneously, by sharing information across similar alleles. The sharing of information across alleles is controlled by a user-defined measure of similarity between alleles. We show that this similarity can be defined in terms of supertypes, or more directly by comparing key residues known to play a role in the peptide-MHC binding. We illustrate the potential of this approach on various benchmark experiments where it outperforms other state-of-the-art methods

    Enhancing in silico protein-based vaccine discovery for eukaryotic pathogens using predicted peptide-MHC binding and peptide conservation scores

    Get PDF
    © 2014 Goodswen et al. Given thousands of proteins constituting a eukaryotic pathogen, the principal objective for a high-throughput in silico vaccine discovery pipeline is to select those proteins worthy of laboratory validation. Accurate prediction of T-cell epitopes on protein antigens is one crucial piece of evidence that would aid in this selection. Prediction of peptides recognised by T-cell receptors have to date proved to be of insufficient accuracy. The in silico approach is consequently reliant on an indirect method, which involves the prediction of peptides binding to major histocompatibility complex (MHC) molecules. There is no guarantee nevertheless that predicted peptide-MHC complexes will be presented by antigen-presenting cells and/or recognised by cognate T-cell receptors. The aim of this study was to determine if predicted peptide-MHC binding scores could provide contributing evidence to establish a protein's potential as a vaccine. Using T-Cell MHC class I binding prediction tools provided by the Immune Epitope Database and Analysis Resource, peptide binding affinity to 76 common MHC I alleles were predicted for 160 Toxoplasma gondii proteins: 75 taken from published studies represented proteins known or expected to induce T-cell immune responses and 85 considered less likely vaccine candidates. The results show there is no universal set of rules that can be applied directly to binding scores to distinguish a vaccine from a non-vaccine candidate. We present, however, two proposed strategies exploiting binding scores that provide supporting evidence that a protein is likely to induce a T-cell immune response-one using random forest (a machine learning algorithm) with a 72% sensitivity and 82.4% specificity and the other, using amino acid conservation scores with a 74.6% sensitivity and 70.5% specificity when applied to the 160 benchmark proteins. More importantly, the binding score strategies are valuable evidence contributors to the overall in silico vaccine discovery pool of evidence

    NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data

    Get PDF
    Major histocompatibility complex (MHC) molecules are expressed on the cell surface, where they present peptides to T cells, which gives them a key role in the development of T-cell immune responses. MHC molecules come in two main variants: MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I predominantly present peptides derived from intracellular proteins, whereas MHC-II predominantly presents peptides from extracellular proteins. In both cases, the binding between MHC and antigenic peptides is the most selective step in the antigen presentation pathway. Therefore, the prediction of peptide binding to MHC is a powerful utility to predict the possible specificity of a T-cell immune response. Commonly MHC binding prediction tools are trained on binding affinity or mass spectrometry-eluted ligands. Recent studies have however demonstrated how the integration of both data types can boost predictive performances. Inspired by this, we here present NetMHCpan-4.1 and NetMHCIIpan-4.0, two web servers created to predict binding between peptides and MHC-I and MHC-II, respectively. Both methods exploit tailored machine learning strategies to integrate different training data types, resulting in state-of-the-art performance and outperforming their competitors. The servers are available at http://www.cbs.dtu.dk/services/NetMHCpan-4.1/ and http://www.cbs.dtu.dk/services/NetMHCIIpan-4.0/.Fil: Reynisson, Birkir. Technical University of Denmark; DinamarcaFil: Alvarez, Bruno. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas; ArgentinaFil: Paul, Sinu. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas; Argentina. Technical University of Denmark; Dinamarc

    Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining

    Get PDF
    MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a spe-cific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However a problem encountered in the computational binding prediction methods for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use to-day require the sequences to be of same length to success-fully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides are used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 70-80% for the tested alleles
    • …
    corecore