103 research outputs found

    Study of the Quantum Advantage in Quantum Machine Learning Applications for drug discovery

    Get PDF
    Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Μαθηματική Προτυποποίηση σε Σύγχρονες Τεχνολογίες και στα Χρηματοοικονομικά

    Application of Generative Models on Modeling Biological Molecules

    Get PDF
    The last decade has been the stage for many groundbreaking Artificial Intelligence technologies, such as revolutionary language models: Generative models capable of synthesizing surprisingly unique data. Such a novelty also brings about public concerns, primarily due to state-of-the-art models' ''black box'' nature. One of the domains that has quickly adopted the generative deep learning paradigm is drug discovery, which, from a pharmaceutical industry point of view, is an extremely expensive and time-consuming process. However, the inner workings of such models are not inherently understandable by humans, causing hesitation to fully trust their results. The concept of disentanglement is one of the fundamental requirements to explain generative models, determining the extent to which steerability and navigation can be achieved in the latent space. Unfortunately, the application potential of interpretability approaches has some limitations depending on the availability of generative latent factors. This work aims to shed some light on the synthesized latent spaces of state-of-the-art molecular generative models: A couple of basic assumptions made about the latent space characteristics are analyzed and potential pitfalls related to domain, architecture, and molecule representation preferences are addressed. The degree to which the steerability in the latent space is achieved is quantified by implementing a novel interpretability approach, providing the basis for the comparison of alternative model configurations. The experiments further revealed that modeling decisions have a direct impact on achievable interpretability; albeit limited by the intricacies of the medicinal chemistry domain

    Development, validation and application of in-silico methods to predict the macromolecular targets of small organic compounds

    Get PDF
    Computational methods to predict the macromolecular targets of small organic drugs and drug-like compounds play a key role in early drug discovery and drug repurposing efforts. These methods are developed by building predictive models that aim to learn the relationships between compounds and their targets in order to predict the bioactivity of the compounds. In this thesis, we analyzed the strategies used to validate target prediction approaches and how current strategies leave crucial questions about performance unanswered. Namely, how does an approach perform on a compound of interest, with its structural specificities, as opposed to the average query compound in the test data? We constructed and present new guidelines on validation strategies to address these short-comings. We then present the development and validation of two ligand-based target prediction approaches: a similarity-based approach and a binary relevance random forest (machine learning) based approach, which have a wide coverage of the target space. Importantly, we applied a new validation protocol to benchmark the performance of these approaches. The approaches were tested under three scenarios: a standard testing scenario with external data, a standard time-split scenario, and a close-to-real-world test scenario. We disaggregated the performance based on the distance of the testing data to the reference knowledge base, giving a more nuanced view of the performance of the approaches. We showed that, surprisingly, the similarity-based approach generally performed better than the machine learning based approach under all testing scenarios, while also having a target coverage which was twice as large. After validating two target prediction approaches, we present our work on a large-scale application of computational target prediction to curate optimized compound libraries. While screening large collections of compounds against biological targets is key to identifying new bioactivities, it is resource intensive and challenging. Small to medium-sized libraries, that have been optimized to have a higher chance of producing a true hit on an arbitrary target of interest are therefore valuable. We curated libraries of readily purchasable compounds by: i. utilizing property filters to ensure that the compounds have key physicochemical properties and are not overly reactive, ii. applying a similaritybased target prediction method, with a wide target scope, to predict the bioactivities of compounds, and iii. employing a genetic algorithm to select compounds for the library to maximize the biological diversity in the predicted bioactivities. These enriched small to medium-sized compound libraries provide valuable tool compounds to support early drug development and target identification efforts, and have been made available to the community. The distinctive contributions of this thesis include the development and benchmarking of two ligand-based target prediction approaches under novel validation scenarios, and the application of target prediction to enrich screening libraries with biologically diverse bioactive compounds. We hope that the insights presented in this thesis will help push data driven drug discovery forward.Doktorgradsavhandlin
    corecore