3 research outputs found

    Determining epitope specificity of T-cell receptors with transformers

    No full text
    SUMMARY: T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR-epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues. AVAILABILITY AND IMPLEMENTATION: Data and code are available on https://github.com/InduKhatri/tcrformer.Pattern Recognition and Bioinformatic

    pmTR database: Population matched (pm) germline allelic variants of T-cell receptor (TR) loci

    Get PDF
    The IMGT database profiles the TR germline alleles for all four TR loci (TRA, TRB, TRG and TRD), however, it does not comprise of the information regarding population specificity and allelic frequencies of these germline alleles. The specificity of allelic variants to different human populations can, however, be a rich source of information when studying the genetic basis of population-specific immune responses in disease and in vaccination. Therefore, we meticulously identified true germline alleles enriched with complete TR allele sequences and their frequencies across 26 different human populations, profiled by “1000 Genomes data”. We identified 205 TRAV, 249 TRBV, 16 TRGV and 5 TRDV germline alleles supported by at least four haplotypes. The diversity of germline allelic variants in the TR loci is the highest in Africans, while the majority of the Non-African alleles are specific to the Asian populations, suggesting a diverse profile of TR germline alleles in different human populations. Interestingly, the alleles in the IMGT database are frequent and common across all five super-populations. We believe that this new set of germline TR sequences represents a valuable new resource which we have made available through the new population-matched TR (pmTR) database, accessible via https://pmtrig.lumc.nl/.Pattern Recognition and Bioinformatic
    corecore