108 research outputs found

    Accurate prediction of major histocompatibility complex class II epitopes by sparse representation via ℓ 1-minimization

    Get PDF
    Background: The major histocompatibility complex (MHC) is responsible for presenting antigens (epitopes) on the surface of antigen-presenting cells (APCs). When pathogen-derived epitopes are presented by MHC class II on an APC surface, T cells may be able to trigger an specific immune response. Prediction of MHC-II epitopes is particularly challenging because the open binding cleft of the MHC-II molecule allows epitopes to bind beyond the peptide binding groove; therefore, the molecule is capable of accommodating peptides of variable length. Among the methods proposed to predict MHC-II epitopes, artificial neural networks (ANNs) and support vector machines (SVMs) are the most effective methods. We propose a novel classification algorithm to predict MHC-II called sparse representation via 1-minimization. Results: We obtained a collection of experimentally confirmed MHC-II epitopes from the Immune Epitope Database and Analysis Resource (IEDB) and applied our 1-minimization algorithm. To benchmark the performance of our proposed algorithm, we compared our predictions against a SVM classifier. We measured sensitivity, specificity abd accuracy; then we used Receiver Operating Characteristic (ROC) analysis to evaluate the performance of our method. The prediction performance of MHC-II epitopes of the 1-minimization algorithm was generally comparable and, in some cases, superior to the standard SVM classification method and overcame the lack of robustness of other methods with respect to outliers. While our method consistently favoured DPPS encoding with the alleles tested, SVM showed a slightly better accuracy when “11-factor” encoding was used. Conclusions: 1-minimization has similar accuracy than SVM, and has additional advantages, such as overcoming the lack of robustness with respect to outliers. With 1-minimization no model selection dependency is involved

    Epitope and T-cell Reactivity Prediction Using Machine Learning Approaches

    Get PDF
    13301甲第3953号博士(工学)金沢大学博士論文本文Ful

    Prediction of MHC-peptide binding: a systematic and comprehensive overview

    Get PDF
    T cell immune responses are driven by the recognition of peptide antigens (T cell epitopes) that are bound to major histocompatibility complex (MHC) molecules. T cell epitope immunogenicity is thus contingent on several events, including appropriate and effective processing of the peptide from its protein source, stable peptide binding to the MHC molecule, and recognition of the MHC-bound peptide by the T cell receptor. Of these three hallmarks, MHC-peptide binding is the most selective event that determines T cell epitopes. Therefore, prediction of MHC-peptide binding constitutes the principal basis for anticipating potential T cell epitopes. The tremendous relevance of epitope identification in vaccine design and in the monitoring of T cell responses has spurred the development of many computational methods for predicting MHC-peptide binding that improve the efficiency and economics of T cell epitope identification. In this report, we will systematically examine the available methods for predicting MHC-peptide binding and discuss their most relevant advantages and drawbacks

    Support Vector Machine-based Fuzzy Systems for Quantitative Prediction of Peptide Binding Affinity

    Get PDF
    Reliable prediction of binding affinity of peptides is one of the most challenging but important complex modelling problems in the post-genome era due to the diversity and functionality of the peptides discovered. Generally, peptide binding prediction models are commonly used to find out whether a binding exists between a certain peptide(s) and a major histocompatibility complex (MHC) molecule(s). Recent research efforts have been focused on quantifying the binding predictions. The objective of this thesis is to develop reliable real-value predictive models through the use of fuzzy systems. A non-linear system is proposed with the aid of support vector-based regression to improve the fuzzy system and applied to the real value prediction of degree of peptide binding. This research study introduced two novel methods to improve structure and parameter identification of fuzzy systems. First, the support-vector based regression is used to identify initial parameter values of the consequent part of type-1 and interval type-2 fuzzy systems. Second, an overlapping clustering concept is used to derive interval valued parameters of the premise part of the type-2 fuzzy system. Publicly available peptide binding affinity data sets obtained from the literature are used in the experimental studies of this thesis. First, the proposed models are blind validated using the peptide binding affinity data sets obtained from a modelling competition. In that competition, almost an equal number of peptide sequences in the training and testing data sets (89, 76, 133 and 133 peptides for the training and 88, 76, 133 and 47 peptides for the testing) are provided to the participants. Each peptide in the data sets was represented by 643 bio-chemical descriptors assigned to each amino acid. Second, the proposed models are cross validated using mouse class I MHC alleles (H2-Db, H2-Kb and H2-Kk). H2-Db, H2-Kb, and H2-Kk consist of 65 nona-peptides, 62 octa-peptides, and 154 octa-peptides, respectively. Compared to the previously published results in the literature, the support vector-based type-1 and support vector-based interval type-2 fuzzy models yield an improvement in the prediction accuracy. The quantitative predictive performances have been improved as much as 33.6\% for the first group of data sets and 1.32\% for the second group of data sets. The proposed models not only improved the performance of the fuzzy system (which used support vector-based regression), but the support vector-based regression benefited from the fuzzy concept also. The results obtained here sets the platform for the presented models to be considered for other application domains in computational and/or systems biology. Apart from improving the prediction accuracy, this research study has also identified specific features which play a key role(s) in making reliable peptide binding affinity predictions. The amino acid features "Polarity", "Positive charge", "Hydrophobicity coefficient", and "Zimm-Bragg parameter" are considered as highly discriminating features in the peptide binding affinity data sets. This information can be valuable in the design of peptides with strong binding affinity to a MHC I molecule(s). This information may also be useful when designing drugs and vaccines

    Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores

    Get PDF
    BACKGROUND: Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. RESULTS: The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 A(ROC )for the MHCBench data sets (up from 0.756), and an average of 0.96 A(ROC )for multiple alleles of the MHCPEP database. CONCLUSION: The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems

    Databases and Algorithms in Allergen Informatics

    Get PDF
    Allergic diseases are considered as one of the major health problems worldwide due to their increasing prevalence. Advancements in genomic, proteomic, and analytical techniques have resulted in considerable progress in the field of allergology, which has led to accumulation of huge amount of data. Allergen bioinformatics comprises allergen-related data resources and computational methods/tools, which deal with an efficient archival, management, and analysis of allergological data. Significant work has been done in the area of allergen bioinformatics that has proven pivotal for the development and progress of this field. In this chapter, we describe the current status of databases and algorithms, encompassing the field of allergen bioinformatics by examining work carried out thus far with respect to features such as allergens and allergenicity, allergen databases, algorithms/tools for allergen/allergenicity prediction, allergen epitope prediction, and allergenic cross-reactivity assessment. This chapter illustrates concepts and algorithms in allergen bioinformatics, as well as it outlines the key areas for potential development in allergology field
    corecore