2 research outputs found

    Enhancing protein interaction prediction using deep learning and protein language models

    Full text link
    Proteins are large macromolecules that play critical roles in many cellular activities in living organisms. These include catalyzing metabolic reactions, mediating signal transduction, DNA replication, responding to stimuli, and transporting molecules, to name a few. Proteins perform their functions by interacting with other proteins and molecules. As a result, determining the nature of such interactions is critically important in many areas of biology and medicine. The primary structure of a protein refers to its specific sequence of amino acids, while the tertiary structure refers to its unique 3D shape, and the quaternary structure refers to the interaction of multiple protein subunits to form a larger, more complex structure. While the number of experimentally determined tertiary and quaternary structures are limited, databases of protein sequences continue to grow at an unprecedented rate, providing a wealth of information for training and improving sequence-based models. Recent developments in the sequence-based model using machine learning and deep learning has shown significant progress toward solving protein-related problems. Specifically, attention-based transformer models, a recent breakthrough in Natural Language Processing (NLP), has shown that large models trained on unlabeled data are able to learn powerful representations of protein sequences and can lead to significant improvements in understanding protein folding, function, and interactions, as well as in drug discovery and protein engineering. The research in this thesis has pursued two objectives using sequence-based modeling. The first is to use deep learning techniques based on NLP to address an important problem in cellular immune system studies, namely, predicting Major Histocompatibility Complex (MHC)-Peptide binding. The second is to improve the performance of the Cluspro docking server, a well-known protein-protein docking tool, in three ways: (i) integrating Cluspro with AlphaFold2, a well-known accurate protein structure predictor, for enhanced protein model docking, (ii) predicting distance maps to improve docking accuracy, and (iii) using regression techniques to rank protein clusters for better results

    Towards Personalized Medicine: Computational Approaches to Support Drug Design and Clinical Decision Making

    Get PDF
    The future looks bright for a clinical practice that tailors the therapy with the best efficacy and highest safety to a patient. Substantial amounts of funding have resulted in technological advances regarding patient-centered data acquisition --- particularly genetic data. Yet, the challenge of translating this data into clinical practice remains open. To support drug target characterization, we developed a global maximum entropy-based method that predicts protein-protein complexes including the three-dimensional structure of their interface from sequence data. To further speed up the drug development process, we present methods to reposition drugs with established safety profiles to new indications leveraging paths in cellular interaction networks. We validated both methods on known data, demonstrating their ability to recapitulate known protein complexes and drug-indication pairs, respectively. After studying the extent and characteristics of genetic variation with a predicted impact on protein function across 60,607 individuals, we showed that most patients carry variants in drug-related genes. However, for the majority of variants, their impact on drug efficacy remains unknown. To inform personalized treatment decisions, it is thus crucial to first collate knowledge from open data sources about known variant effects and to then close the knowledge gaps for variants whose effect on drug binding is still not characterized. Here, we built an automated annotation pipeline for patient-specific variants whose value we illustrate for a set of patients with hepatocellular carcinoma. We further developed a molecular modeling protocol to predict changes in binding affinity in proteins with genetic variants which we evaluated for several clinically relevant protein kinases. Overall, we expect that each presented method has the potential to advance personalized medicine by closing knowledge gaps about protein interactions and genetic variation in drug-related genes. To reach clinical applicability, challenges with data availability need to be overcome and prediction performance should be validated experimentally.Therapien mit der besten Wirksamkeit und höchsten Sicherheit werden in Zukunft auf den Patienten zugeschnitten werden. Hier haben erhebliche finanzielle Mittel zu technologischen Fortschritten bei der patientenzentrierten Datenerfassung geführt, aber diese Daten in die klinische Praxis zu übertragen, bleibt aktuell noch eine Herausforderung. Um die Wirkstoffforschung in der Charakterisierung therapeutischer Zielproteine zu unterstützen, haben wir eine Maximum-Entropie-Methode entwickelt, die Protein-Interaktionen und ihre dreidimensionalen Struktur aus Sequenzdaten vorhersagt. Darüber hinaus, stellen wir Methoden zur Repositionierung von etablierten Arzneimitteln auf neue Indikationen vor, die Pfade in zellulären Interaktionsnetze nutzen. Diese Methoden haben wir anhand bekannter Daten validiert und ihre Fähigkeit demonstriert, bekannte Proteinkomplexe bzw. Wirkstoff-Indikations-Paare zu rekapitulieren. Unsere Analyse genetischer Variation mit einem Einfluss auf die Proteinfunktion in 60,607 Individuen konnte zeigen, dass nahezu jeder Patient funktionsverändernde Varianten in Medikamenten-assoziierten Genen trägt. Der direkte Einfluss der meisten beobachteten Varianten auf die Medikamenten-Wirksamkeit ist jedoch noch unbekannt. Um dennoch personalisierte Behandlungsentscheidungen treffen zu können, präsentieren wir eine Annotationspipeline für genetische Varianten, deren Wert wir für Patienten mit hepatozellulärem Karzinom illustrieren konnten. Darüber hinaus haben wir ein molekulares Modellierungsprotokoll entwickelt, um die Veränderungen in der Bindungsaffinität von Proteinen mit genetischen Varianten voraussagen. Insgesamt sind wir davon überzeugt, dass jede der vorgestellten Methoden das Potential hat, Wissenslücken über Proteininteraktionen und genetische Variationen in medikamentenbezogenen Genen zu schlie{\ss}en und somit das Feld der personalisierten Medizin voranzubringen. Um klinische Anwendbarkeit zu erreichen, gilt es in der Zukunft, verbleibende Herausforderungen bei der Datenverfügbarkeit zu bewältigen und unsere Vorhersagen experimentell zu validieren
    corecore