8 research outputs found

    On TCR binding predictors failing to generalize to unseen peptides

    Get PDF
    Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved

    Expitope 2.0: a tool to assess immunotherapeutic antigens for their potential cross-reactivity against naturally expressed proteins in human tissues

    No full text
    Abstract Background Adoptive immunotherapy offers great potential for treating many types of cancer but its clinical application is hampered by cross-reactive T cell responses in healthy human tissues, representing serious safety risks for patients. We previously developed a computational tool called Expitope for assessing cross-reactivity (CR) of antigens based on tissue-specific gene expression. However, transcript abundance only indirectly indicates protein expression. The recent availability of proteome-wide human protein abundance information now facilitates a more direct approach for CR prediction. Here we present a new version 2.0 of Expitope, which computes all naturally possible epitopes of a peptide sequence and the corresponding CR indices using both protein and transcript abundance levels weighted by a proposed hierarchy of importance of various human tissues. Results We tested the tool in two case studies: The first study quantitatively assessed the potential CR of the epitopes used for cancer immunotherapy. The second study evaluated HLA-A*02:01-restricted epitopes obtained from the Immune Epitope Database for different disease groups and demonstrated for the first time that there is a high variation in the background CR depending on the disease state of the host: compared to a healthy individual the CR index is on average two-fold higher for the autoimmune state, and five-fold higher for the cancer state. Conclusions The ability to predict potential side effects in normal tissues helps in the development and selection of safer antigens, enabling more successful immunotherapy of cancer and other diseases

    Validity of machine learning in biology and medicine increased through collaborations across fields of expertise

    Full text link
    Machine learning (ML) has become an essential asset for the life sciences and medicine. We selected 250 articles describing ML applications from 17 journals sampling 26 different fields between 2011 and 2016. Independent evaluation by two readers highlighted three results. First, only half of the articles shared software, 64% shared data and 81% applied any kind of evaluation. Although crucial for ensuring the validity of ML applications, these aspects were met more by publications in lower-ranked journals. Second, the authors’ scientific backgrounds highly influenced how technical aspects were addressed: reproducibility and computational evaluation methods were more prominent with computational co-authors; experimental proofs more with experimentalists. Third, 73% of the ML applications resulted from interdisciplinary collaborations comprising authors from at least two of the three disciplines: computational sciences, biology, and medicine. The results suggested collaborations between computational and experimental scientists to generate more scientifically sound and impactful work integrating knowledge from both domains. Although scientifically more valid solutions and collaborations involving diverse expertise did not correlate with impact factors, such collaborations provide opportunities to both sides: computational scientists are given access to novel and challenging real-world biological data, increasing the scientific impact of their research, and experimentalists benefit from more in-depth computational analyses improving the technical correctness of work
    corecore