2 research outputs found

    Determining epitope specificity of T-cell receptors with Transformers

    No full text
    Transformers have dominated the field of natural language processing due to their competency in learning complex relationships within a sequence. Reusing a pre-trained transformer for a downstream task is known as Trans-fer learning. Transfer learning restricts the transformer to a fixed vocabulary; modification in transformer implementation will extend the utility of the transformer. Implementing transformers for complex biological problems can be beneficial in addressing the complexities in the biological sequences. One such biological problem is to capture the specificity of diverse T-cell repertoire to the unique antigens (i.e., immunogenic pathogenic elements). Using transformers to assess the relationship between T-cell receptors and antigen at the sequence level can provide us with better insights into the processes involved in these precise and complex immune responses in humans and murine. In this work, we determined the specificity of multiple TCR to unique antigens by classifying the CDR3 re-gions of TCR sequences to a particular antigen. For this problem, we used three pre-trained auto-encoder (ProtBERT, ProtALBERT, ProtELECTRA) and one pre-trained auto-regressive (ProtXLNet) transformer model wherein, to adapt to the challenges of the complex biological problem at hand, we implemented modifications in the transformers chosen here. We used the VDJdb to obtain the biological data for training and testing the selected transformers. After pre-processing data, we predicted the TCR specificity for 25 antigens (classes) in a multi-class setting. Transformers could predict the specificity of TCRs to an antigen with just the CDR3 sequences from the TCRB chain (weighted F1 score 0.48), the data that was unseen by the transformers. With additional features incorpo-rated, i.e., gene names for TCRs, the weighted F1 improved to 0.55 in the best performing transformer. We demon-strated that different modifications in transformers recognized out-of-vocabulary features with these results. When com-paring the AUC from the transformer model to other previously developed methods for the same biological problem such as TCRGP, TCRDist and DeepTCR, we observed that the transformers outperformed the previously available methods. To exemplify, the MCMV epitope family that suffered from restricted performance in TCRGP due to fewer training samples (~100 samples) showed 10% improvement in AUC with transformers under similar training samples. Transformer's proficiency in learning from fewer data combined with holistic modifications in transformers implementations proves that we can extend its capabilities to explore other biological settings. Further ingenuity in utiliz-ing the full potential of transformers either through attention head visualization or introducing additional features can fur-ther extend T-cell research avenues.Computer Science | Data Science and Technolog

    Determining epitope specificity of T-cell receptors with transformers

    No full text
    SUMMARY: T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR-epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues. AVAILABILITY AND IMPLEMENTATION: Data and code are available on https://github.com/InduKhatri/tcrformer.Pattern Recognition and Bioinformatic
    corecore