Search CORE

44 research outputs found

FRED—a framework for T-cell epitope detection

Author: Buus
Deluca
Donnes
Doytchinova
M. Feldhahn
Nielsen
Nielsen
O. Kohlbacher
P. Donnes
P. Thiel
Parker
Sturniolo
Toussaint
Publication venue: Oxford University Press
Publication date
Field of study

Summary: Over the last decade, immunoinformatics has made significant progress. Computational approaches, in particular the prediction of T-cell epitopes using machine learning methods, are at the core of modern vaccine design. Large-scale analyses and the integration or comparison of different methods become increasingly important. We have developed FRED, an extendable, open source software framework for key tasks in immunoinformatics. In this, its first version, FRED offers easily accessible prediction methods for MHC binding and antigen processing as well as general infrastructure for the handling of antigen sequence data and epitopes. FRED is implemented in Python in a modular way and allows the integration of external methods

Crossref

PubMed Central

Serum Metabolomic Signatures Can Predict Subclinical Atherosclerosis in Patients With Systemic Lupus Erythematosus

Author: Bakshi J
Chocano E
Coelewij L
Croca S
Donnes P
Farinha F
Griffin M
Jury EC
McDonnell T
Nicolaides A
Peng J
Pineda-Torra I
Rahman A
Robinson GA
Smith E
Waddington KE
Publication venue
Publication date: 04/02/2021
Field of study

OBJECTIVE: Patients with systemic lupus erythematosus (SLE) have an increased risk of developing cardiovascular disease. Standard serum lipid measurements in clinical practice do not predict cardiovascular disease risk in patients with SLE. More detailed analysis of lipoprotein taxonomy could identify better predictors of cardiovascular disease risk in SLE. Approach and Results: Eighty women with SLE and no history of cardiovascular disease underwent carotid and femoral ultrasound scans; 30 had atherosclerosis plaques (patients with SLE with subclinical plaque) and 50 had no plaques (patients with SLE with no subclinical plaque). Serum samples obtained at the time of the scan were analyzed using a lipoprotein-focused metabolomics platform assessing 228 metabolites by nuclear magnetic resonance spectroscopy. Data were analyzed using logistic regression and 5 binary classification models with 10-fold cross validation. Patients with SLE had global changes in complex lipoprotein profiles compared with healthy controls despite having clinical serum lipid levels within normal ranges. In the SLE cohort, univariate logistic regression identified 4 metabolites associated with subclinical plaque; 3 subclasses of VLDL (very low-density lipoprotein; free cholesterol in medium and large VLDL particles and phospholipids in chylomicrons and extremely large VLDL particles) and leucine. Together with age, these metabolites were also within the top features identified by the lasso logistic regression (with and without interactions) and random forest machine learning models. Logistic regression with interactions differentiated between patients with SLE with subclinical plaque and patients with SLE with no subclinical plaque groups with the greatest accuracy (0.800). Notably, free cholesterol in large VLDL particles and age differentiated between patients with SLE with subclinical plaque and patients with SLE with no subclinical plaque in all models. CONCLUSIONS: Serum metabolites are promising biomarkers to uncover and predict multimetabolic phenotypes of subclinical atherosclerosis in SLE

UCL Discovery

Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction

Author: A Craiu
AK Nussbaum
B Peters
B Peters
C Kesmir
C Kuttler
Claus Lundegaard
H Rammensee
HG Holzhutter
HG Holzhutter
IA Doytchinova
J Hakenberg
J Koch
JC Tong
JW Yewdell
Kasper Lamberth
KC Parker
KD Smith
L Stoltze
M Nielsen
M Nielsen
Mette V Larsen
ML Wei
Morten Nielsen
MV Larsen
MV Larsen
O Lund
Ole Lund
P Armitage
P Donnes
P Paz
PM van Endert
RA Henderson
S Tenzer
Soren Buus
U Ritz
XY Mo
Y Altuvia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Abstract Background Reliable predictions of Cytotoxic T lymphocyte (CTL) epitopes are essential for rational vaccine design. Most importantly, they can minimize the experimental effort needed to identify epitopes. NetCTL is a web-based tool designed for predicting human CTL epitopes in any given protein. It does so by integrating predictions of proteasomal cleavage, TAP transport efficiency, and MHC class I affinity. At least four other methods have been developed recently that likewise attempt to predict CTL epitopes: EpiJen, MAPPP, MHC-pathway, and WAPP. In order to compare the performance of prediction methods, objective benchmarks and standardized performance measures are needed. Here, we develop such large-scale benchmark and corresponding performance measures and report the performance of an updated version 1.2 of NetCTL in comparison with the four other methods. Results We define a number of performance measures that can handle the different types of output data from the five methods. We use two evaluation datasets consisting of known HIV CTL epitopes and their source proteins. The source proteins are split into all possible 9 mers and except for annotated epitopes; all other 9 mers are considered non-epitopes. In the RANK measure, we compare two methods at a time and count how often each of the methods rank the epitope highest. In another measure, we find the specificity of the methods at three predefined sensitivity values. Lastly, for each method, we calculate the percentage of known epitopes that rank within the 5% peptides with the highest predicted score. Conclusion NetCTL-1.2 is demonstrated to have a higher predictive performance than EpiJen, MAPPP, MHC-pathway, and WAPP on all performance measures. The higher performance of NetCTL-1.2 as compared to EpiJen and MHC-pathway is, however, not statistically significant on all measures. In the large-scale benchmark calculation consisting of 216 known HIV epitopes covering all 12 recognized HLA supertypes, the NetCTL-1.2 method was shown to have a sensitivity among the 5% top-scoring peptides above 0.72. On this dataset, the best of the other methods achieved a sensitivity of 0.64. The NetCTL-1.2 method is available at <url>http://www.cbs.dtu.dk/services/NetCTL</url>. All used datasets are available at <url>http://www.cbs.dtu.dk/suppl/immunology/CTL-1.2.php</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Copenhagen University Research Information System

PubMed Central

Online Research Database In Technology

Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method

Author: A Bairoch
A Sette
B Peters
B Peters
CA Nelson
Claus Lundegaard
CP Toseland
GL Zhang
H Noguchi
H Rammensee
H Singh
HH Bui
IA Doytchinova
J Wan
JA Swets
M Nielsen
M Nielsen
Morten Nielsen
N Metropolis
O Karpenko
Ole Lund
P Donnes
S Henikoff
S Kullback
ST Chang
T Sturniolo
TD Schneider
U Hobohm
V Brusic
WH Press
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Antigen presenting cells (APCs) sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II) molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles. Results The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR), we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors. Conclusion The SMM-align method was shown to outperform other state of the art MHC class II prediction methods. The method predicts quantitative peptide:MHC binding affinity values, making it ideally suited for rational epitope discovery. The method has been trained and evaluated on the, to our knowledge, largest benchmark data set publicly available and covers the nine HLA-DR supertypes suggested as well as three mouse H2-IA allele. Both the peptide benchmark data set, and SMM-align prediction method (<it>NetMHCII</it>) are made publicly available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

On Evaluating MHC-II Binding Peptide Prediction Methods

Author: A Bairoch
A Chinnasamy
B Korber
B Peters
C Cai
C Leslie
D Madden
D O'Sullivan
Drena Dobbs
F Burden
G Raghava
G Tsoumakas
G Zhang
H Mamitsuka
H Noguchi
H Rammensee
H Saigo
H Singh
H Yu
I Witten
J Cui
J Demšar
J Garcia
J Platt
J Salomon
M Bhasin
M Bhasin
M Friedman
M Nielsen
M Nielsen
M Nielsen
M Rajapakse
N Murugan
P Baldi
P Donnes
P Reche
P Wang
R Fisher
R Mallios
S Buus
T Hertz
U Gowthaman
V Brusic
Vasant Honavar
Vladimir B. Bajic
Yasser EL-Manzalawy
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Choice of one method over another for MHC-II binding peptide prediction is typically based on published reports of their estimated performance on standard benchmark datasets. We show that several standard benchmark datasets of unique peptides used in such studies contain a substantial number of peptides that share a high degree of sequence identity with one or more other peptide sequences in the same dataset. Thus, in a standard cross-validation setup, the test set and the training set are likely to contain sequences that share a high degree of sequence identity with each other, leading to overly optimistic estimates of performance. Hence, to more rigorously assess the relative performance of different prediction methods, we explore the use of similarity-reduced datasets. We introduce three similarity-reduced MHC-II benchmark datasets derived from MHCPEP, MHCBN, and IEDB databases. The results of our comparison of the performance of three MHC-II binding peptide prediction methods estimated using datasets of unique peptides with that obtained using their similarity-reduced counterparts shows that the former can be rather optimistic relative to the performance of the same methods on similarity-reduced counterparts of the same datasets. Furthermore, our results demonstrate that conclusions regarding the superiority of one method over another drawn on the basis of performance estimates obtained using commonly used datasets of unique peptides are often contradicted by the observed performance of the methods on the similarity-reduced versions of the same datasets. These results underscore the importance of using similarity-reduced datasets in rigorously comparing the performance of alternative MHC-II peptide prediction methods

Digital Repository @ Iowa State University (ISU)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PepDist: A New Framework for Protein-Peptide Binding Prediction based on Learning Peptide Distance Functions

Author: A Bar-Hilel
A Sette
A Sette
AP Dempster
AY Hung
CA Janeway
Chen Yanover
D Klein
DR Flower
DR Madden
E Xing
H Mamitsuka
HG Rammensee
JW Yewdell
K Gulukota
K WagstafF
K Yu
M Andersen
M Bhasin
M Bilenko
MS Venkatarajan
N Shental
O Schueler-Furman
P Donnes
PA Reche
RE Schapire
RE Schapire
S Buus
T Bailey
T Hertz
T Hertz
Tomer Hertz
U Wiedemann
V Brusic
V Brusic
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Many different aspects of cellular signalling, trafficking and targeting mechanisms are mediated by interactions between proteins and peptides. Representative examples are MHC-peptide complexes in the immune system. Developing computational methods for protein-peptide binding prediction is therefore an important task with applications to vaccine and drug design. METHODS: Previous learning approaches address the binding prediction problem using traditional margin based binary classifiers. In this paper we propose PepDist: a novel approach for predicting binding affinity. Our approach is based on learning peptide-peptide distance functions. Moreover, we suggest to learn a single peptide-peptide distance function over an entire family of proteins (e.g. MHC class I). This distance function can be used to compute the affinity of a novel peptide to any of the proteins in the given family. In order to learn these peptide-peptide distance functions, we formalize the problem as a semi-supervised learning problem with partial information in the form of equivalence constraints. Specifically, we propose to use DistBoost [1,2], which is a semi-supervised distance learning algorithm. RESULTS: We compare our method to various state-of-the-art binding prediction algorithms on MHC class I and MHC class II datasets. In almost all cases, our method outperforms all of its competitors. One of the major advantages of our novel approach is that it can also learn an affinity function over proteins for which only small amounts of labeled peptides exist. In these cases, our method's performance gain, when compared to other computational methods, is even more pronounced. We have recently uploaded the PepDist webserver which provides binding prediction of peptides to 35 different MHC class I alleles. The webserver which can be found at is powered by a prediction engine which was trained using the framework presented in this paper. CONCLUSION: The results obtained suggest that learning a single distance function over an entire family of proteins achieves higher prediction accuracy than learning a set of binary classifiers for each of the proteins separately. We also show the importance of obtaining information on experimentally determined non-binders. Learning with real non-binders generalizes better than learning with randomly generated peptides that are assumed to be non-binders. This suggests that information about non-binding peptides should also be published and made publicly available

Crossref

Springer - Publisher Connector

PubMed Central

PeptX: Using Genetic Algorithms to optimize peptides for MHC binding

Author: A Logean
AS Parker
AW Purcell
B Knapp
B Knapp
B Knapp
B Knapp
B Peters
Bernhard Knapp
C Lundegaard
D Rognan
DH Ackley
DR Flower
EM Lafuente
F Sieker
GA Lazar
GE Crooks
H Tsurui
HG Rammensee
HH Lin
HH Lin
I Dimitrov
IA Doytchinova
J Alexander
J Alexander
J Wan
J Xu
JC Tong
JC Tong
JE Baker
JH Holland
JM Wisniewska
K Falk
K Roomp
M Bhasin
M Larche
M Larche
M Larche
MG Rudolph
MN Davies
MN Davies
NC Toussaint
O Schueler-Furman
P Donnes
P Guan
P Saxova
P Wang
PA Reche
Q Zhang
R Vita
R Wang
Reiner Ribarics
S Mishra
T Baeck
U Gowthaman
VA Walshe
Verena Giczi
Wolfgang Schreiner
X Shang
Y El-Manzalawy
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The binding between the major histocompatibility complex and the presented peptide is an indispensable prerequisite for the adaptive immune response. There is a plethora of different <it>in silico </it>techniques for the prediction of the peptide binding affinity to major histocompatibility complexes. Most studies screen a set of peptides for promising candidates to predict possible T cell epitopes. In this study we ask the question vice versa: Which peptides do have highest binding affinities to a given major histocompatibility complex according to certain <it>in silico </it>scoring functions? Results Since a full screening of all possible peptides is not feasible in reasonable runtime, we introduce a heuristic approach. We developed a framework for Genetic Algorithms to optimize peptides for the binding to major histocompatibility complexes. In an extensive benchmark we tested various operator combinations. We found that (1) selection operators have a strong influence on the convergence of the population while recombination operators have minor influence and (2) that five different binding prediction methods lead to five different sets of "optimal" peptides for the same major histocompatibility complex. The consensus peptides were experimentally verified as high affinity binders. Conclusion We provide a generalized framework to calculate sets of high affinity binders based on different previously published scoring functions in reasonable runtime. Furthermore we give insight into the different behaviours of operators and scoring functions of the Genetic Algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores

Author: AJ Godkin
B Efron
B Schölkopf
B Schölkopf
CK Hattotuwagama
D Haussler
DA Rhodes
Darren R Flower
FR Burden
G Bonomi
GP Raghava
H Kropshofer
H Noguchi
H Noguchi
H Rammensee
H Saigo
IA Doytchinova
J Hammer
J Hammer
J Xia
JC Tong
JD Blake
Jesper Salomon
JP Vert
JW Yewdell
M Bhasin
M Bhasin
M Nielsen
M Xiao YS
MH Wauben
N Murugan
O Karpenko
P Donnes
P Guan
PY Arnold
R Kuang
RR Mallios
RT Carson
S Henikoff
S Kawashima
SF Altschul
T Muller
T Muller
TF Smith
V Brusic
V Brusic
VN Vapnik
W Liu
Y Bengio
Z Dosztanyi
Z Zavala-Ruiz
Z Zavala-Ruiz
ZR Yang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. RESULTS: The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 A(ROC )for the MHCBench data sets (up from 0.756), and an average of 0.96 A(ROC )for multiple alleles of the MHCPEP database. CONCLUSION: The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems

Crossref

Directory of Open Access Journals

PubMed Central

Aston Publications Explorer

Oxford University Research Archive

'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

Author: A Bulashevska
A Krogh
C Andreoli
C Guda
C Guda
CS Yu
E Badidi
E Frank
GE Tusnady
Gertraud Burger
H Bannai
H Shatkay
HB Shen
HB Shen
I Small
JL Heazlewood
JR Quinlan
JY Shi
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
L Kall
M Bhasin
M Boden
MG Claros
MS Scott
N Pfanner
N Wiedemann
O Emanuelsson
P Donnes
QB Gao
S Džeroski
S Hua
S Matsuda
SHB Chou KC
T Hirokawa
T Zhang
W Li
X Xiao
Y Huang
Yao Qing Shen
YD Cai
YD Cai
YL Chen
YX Pan
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. Results In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains. Conclusion We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein prediction (named YimLOC), which can be accessed through the AnaBench suite at http://anabench.bcm.umontreal.ca/anabench/. The source code is provided in the Additional File <supplr sid="S2">2</supplr>. <suppl id="S2"> <title> Additional file 2 </title> <text> This file contains scripts for the online server YimLOC. Please note that there scripts only codes for the ready-to-use STACK-mem-DT described in the main text. The scripts do not provide the training process. </text> <file name="1471-2105-8-420-S2.pdf"> Click here for file </file> </suppl

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence

Binding of peptides to Major Histocompatibility Complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC class I system (HLA-I) is extremely polymorphic. The number of registered HLA-I molecules has now surpassed 1500. Characterizing the specificity of each separately would be a major undertaking.Here, we have drawn on a large database of known peptide-HLA-I interactions to develop a bioinformatics method, which takes both peptide and HLA sequence information into account, and generates quantitative predictions of the affinity of any peptide-HLA-I interaction. Prospective experimental validation of peptides predicted to bind to previously untested HLA-I molecules, cross-validation, and retrospective prediction of known HIV immune epitopes and endogenous presented peptides, all successfully validate this method. We further demonstrate that the method can be applied to perform a clustering analysis of MHC specificities and suggest using this clustering to select particularly informative novel MHC molecules for future biochemical and functional analysis.Encompassing all HLA molecules, this high-throughput computational method lends itself to epitope searches that are not only genome- and pathogen-wide, but also HLA-wide. Thus, it offers a truly global analysis of immune responses supporting rational development of vaccines and immunotherapy. It also promises to provide new basic insights into HLA structure-function relationships. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

Online Research Database In Technology