Motivation: In silico methods for the prediction of antigenic peptides
binding to MHC class I molecules play an increasingly important role in the
identification of T-cell epitopes. Statistical and machine learning methods, in
particular, are widely used to score candidate epitopes based on their
similarity with known epitopes and non epitopes. The genes coding for the MHC
molecules, however, are highly polymorphic, and statistical methods have
difficulties to build models for alleles with few known epitopes. In this case,
recent works have demonstrated the utility of leveraging information across
alleles to improve the performance of the prediction. Results: We design a
support vector machine algorithm that is able to learn epitope models for all
alleles simultaneously, by sharing information across similar alleles. The
sharing of information across alleles is controlled by a user-defined measure
of similarity between alleles. We show that this similarity can be defined in
terms of supertypes, or more directly by comparing key residues known to play a
role in the peptide-MHC binding. We illustrate the potential of this approach
on various benchmark experiments where it outperforms other state-of-the-art
methods

Jacob, Laurent

Vert, Jean-Philippe

English

arXiv

We use various multitask kernels in order to improve MHC-I-peptide binding prediction, in particular for MHC alleles for which few training data is available.Motivation: In silico methods for the prediction of antigenic peptides binding to MHC class I molecules play an increasingly important role in the identification of T-cell epitopes. Statistical and machine learning methods, in particular, are widely used to score candidate epitopes based on their similarity with known epitopes and non epitopes. The genes coding for the MHC molecules, however, are highly polymorphic, and statistical methods have difficulties to build models for alleles with few known epitopes. In this case, recent works have demonstrated the utility of leveraging information across alleles to improve the performance of the prediction. Results: We design a support vector machine algorithm that is able to learn epitope models for all alleles simultaneously, by sharing information across similar alleles. The sharing of information across alleles is controlled by a user-defined measure of similarity between alleles. We show that this similarity can be defined in terms of supertypes, or more directly by comparing key residues known to play a role in the peptide-MHC binding. We illustrate the potential of this approach on various benchmark experiments where it outperforms other state-of-the-art methods

Epitope prediction improved by multitask support vector machines

Abstract

Similar works

Full text

Available Versions

HAL-MINES ParisTech