14 research outputs found

    Convolutional LSTM Networks for Subcellular Localization of Proteins

    Get PDF
    Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

    Footprints of antigen processing boost MHC class II natural ligand predictions

    Get PDF
    Abstract Background Major histocompatibility complex class II (MHC-II) molecules present peptide fragments to T cells for immune recognition. Current predictors for peptide to MHC-II binding are trained on binding affinity data, generated in vitro and therefore lacking information about antigen processing. Methods We generate prediction models of peptide to MHC-II binding trained with naturally eluted ligands derived from mass spectrometry in addition to peptide binding affinity data sets. Results We show that integrated prediction models incorporate identifiable rules of antigen processing. In fact, we observed detectable signals of protease cleavage at defined positions of the ligands. We also hypothesize a role of the length of the terminal ligand protrusions for trimming the peptide to the MHC presented ligand. Conclusions The results of integrating binding affinity and eluted ligand data in a combined model demonstrate improved performance for the prediction of MHC-II ligands and T cell epitopes and foreshadow a new generation of improved peptide to MHC-II prediction tools accounting for the plurality of factors that determine natural presentation of antigens
    corecore