2 research outputs found
Enhancing protein interaction prediction using deep learning and protein language models
Proteins are large macromolecules that play critical roles in many cellular activities in living organisms. These include catalyzing metabolic reactions, mediating signal transduction, DNA replication, responding to stimuli, and transporting molecules, to name a few. Proteins perform their functions by interacting with other proteins and molecules. As a result, determining the nature of such interactions is critically important in many areas of biology and medicine. The primary structure of a protein refers to its specific sequence of amino acids, while the tertiary structure refers to its unique 3D shape, and the quaternary structure refers to the interaction of multiple protein subunits to form a larger, more complex structure. While the number of experimentally determined tertiary and quaternary structures are limited, databases of protein sequences continue to grow at an unprecedented rate, providing a wealth of information for training and improving sequence-based models.
Recent developments in the sequence-based model using machine learning and deep learning has shown significant progress toward solving protein-related problems. Specifically, attention-based transformer models, a recent breakthrough in Natural Language Processing (NLP), has shown that large models trained on unlabeled data are able to learn powerful representations of protein sequences and can lead to significant improvements in understanding protein folding, function, and interactions, as well as in drug discovery and protein engineering.
The research in this thesis has pursued two objectives using sequence-based modeling. The first is to use deep learning techniques based on NLP to address an important problem in cellular immune system studies, namely, predicting Major Histocompatibility Complex (MHC)-Peptide binding. The second is to improve the performance of the Cluspro docking server, a well-known protein-protein docking tool, in three ways: (i) integrating Cluspro with AlphaFold2, a well-known accurate protein structure predictor, for enhanced protein model docking, (ii) predicting distance maps to improve docking accuracy, and (iii) using regression techniques to rank protein clusters for better results
Towards Personalized Medicine: Computational Approaches to Support Drug Design and Clinical Decision Making
The future looks bright for a clinical practice that tailors the
therapy with the best efficacy and highest safety to a patient. Substantial
amounts of funding have resulted in technological advances regarding
patient-centered data acquisition --- particularly genetic data. Yet, the
challenge of translating this data into clinical practice remains open.
To support drug target characterization, we developed a global maximum
entropy-based method that predicts protein-protein complexes including the
three-dimensional structure of their interface from sequence data. To further
speed up the drug development process, we present methods to reposition drugs
with established safety profiles to new indications leveraging paths in
cellular interaction networks. We validated both methods on known data,
demonstrating their ability to recapitulate known protein complexes and
drug-indication pairs, respectively.
After studying the extent and characteristics of genetic variation with a
predicted impact on protein function across 60,607 individuals, we showed that
most patients carry variants in drug-related genes. However, for the majority
of variants, their impact on drug efficacy remains unknown. To inform
personalized treatment decisions, it is thus crucial to first collate knowledge
from open data sources about known variant effects and to then close the
knowledge gaps for variants whose effect on drug binding is still not
characterized. Here, we built an automated annotation pipeline for
patient-specific variants whose value we illustrate for a set of patients with
hepatocellular carcinoma. We further developed a molecular modeling protocol to
predict changes in binding affinity in proteins with genetic variants which we
evaluated for several clinically relevant protein kinases.
Overall, we expect that each presented method has the potential to advance
personalized medicine by closing knowledge gaps about protein interactions and
genetic variation in drug-related genes. To reach clinical applicability,
challenges with data availability need to be overcome and prediction
performance should be validated experimentally.Therapien mit der besten Wirksamkeit und höchsten
Sicherheit werden in Zukunft auf den Patienten zugeschnitten werden. Hier haben
erhebliche finanzielle Mittel zu technologischen Fortschritten bei der
patientenzentrierten Datenerfassung geführt, aber diese Daten in die
klinische Praxis zu übertragen, bleibt aktuell noch eine Herausforderung.
Um die Wirkstoffforschung in der Charakterisierung therapeutischer Zielproteine
zu unterstützen, haben wir eine Maximum-Entropie-Methode entwickelt,
die Protein-Interaktionen und ihre dreidimensionalen Struktur
aus Sequenzdaten vorhersagt. Darüber hinaus, stellen wir Methoden
zur Repositionierung von etablierten Arzneimitteln auf
neue Indikationen vor, die Pfade in zellulären Interaktionsnetze nutzen.
Diese Methoden haben wir anhand bekannter Daten validiert und ihre Fähigkeit
demonstriert, bekannte Proteinkomplexe bzw. Wirkstoff-Indikations-Paare zu
rekapitulieren.
Unsere Analyse genetischer Variation mit einem Einfluss auf die
Proteinfunktion in 60,607 Individuen konnte zeigen, dass nahezu jeder Patient
funktionsverändernde Varianten in Medikamenten-assoziierten Genen
trägt. Der direkte Einfluss der meisten beobachteten Varianten auf die
Medikamenten-Wirksamkeit ist jedoch noch unbekannt. Um dennoch personalisierte
Behandlungsentscheidungen treffen zu können, präsentieren wir eine Annotationspipeline für genetische
Varianten, deren Wert wir für Patienten mit hepatozellulärem
Karzinom illustrieren konnten. Darüber hinaus haben wir ein molekulares
Modellierungsprotokoll entwickelt, um die Veränderungen in der
Bindungsaffinität von Proteinen mit genetischen Varianten voraussagen.
Insgesamt sind wir davon überzeugt, dass jede der vorgestellten Methoden das
Potential hat, Wissenslücken über Proteininteraktionen und
genetische Variationen in medikamentenbezogenen Genen zu schlie{\ss}en und
somit das Feld der personalisierten Medizin voranzubringen. Um klinische
Anwendbarkeit zu erreichen, gilt es in der Zukunft, verbleibende
Herausforderungen bei der Datenverfügbarkeit zu bewältigen und unsere
Vorhersagen experimentell zu validieren