Search CORE

20 research outputs found

Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches

Author: Gibert Enric
Herrero Enric
Luque Garriga F. Xavier
López Manel
Vázquez Javier
Publication venue: 'MDPI AG'
Publication date: 22/10/2020
Field of study

Virtual screening (VS) is an outstanding cornerstone in the drug discovery pipeline. A variety of computational approaches, which are generally classified as ligand-based (LB) and structure-based (SB) techniques, exploit key structural and physicochemical properties of ligands and targets to enable the screening of virtual libraries in the search of active compounds. Though LB and SB methods have found widespread application in the discovery of novel drug-like candidates, their complementary natures have stimulated continued e orts toward the development of hybrid strategies that combine LB and SB techniques, integrating them in a holistic computational framework that exploits the available information of both ligand and target to enhance the success of drug discovery projects. In this review, we analyze the main strategies and concepts that have emerged in the last years for defining hybrid LB + SB computational schemes in VS studies. Particularly, attention is focused on the combination of molecular similarity and docking, illustrating them with selected applications taken from the literature

Diposit Digital de la Universitat de Barcelona

3D Convolutional Neural Networks for Computational Drug Discovery

Author: Sunseri Jocelyn
Publication venue
Publication date: 06/01/2021
Field of study

This thesis describes aspects of the implementation and application of voxel-based con- volutional neural networks (CNNs) to problems in computational drug discovery. It opens by justifying the novelty of this approach by presenting a more mainstream approach to the common tasks of virtual screening and binding pose prediction, augmented with more sim- plistic machine learning methods, and demonstrating their suboptimal performance when applied prospectively. It then describes my contributions to our group’s development of voxel-based CNNs as we honed their implementation and training strategy, and reports our library that facilitates featurization and training using this approach. It continues with a prospective assessment of their performance, analogous to the first prospective evaluation, with the addition of a novel CNN-based pose sampling strategy. Next it makes a foray into model explanation, first in an oblique fashion, by examining the transferability of models to tasks that are distinct from but related to the tasks for which they were trained, and by a comparison with an approach based on exploiting dataset bias using other machine learning methods. Finally it describes the implementation of a more direct approach to model ex- planation, by using a trained network to perform optimization of inputs with respect to the network as a whole or individual nodes and analyzing the content of the result as well as its utility as a pseudo-pharmacophore

D-Scholarship@Pitt

Technological developments in Virtual Screening for the discovery of small molecules with novel mechanisms of action

Author: Miñarro Lleonar Marina
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 24/02/2023
Field of study

Programa de Doctorat en Recerca, Desenvolupament i Control de Medicaments[eng] Advances in structural and molecular biology have favoured the rational development of novel drugs thru structure-based drug design (SBDD). Particularly, computational tools have proven to be rapid and efficient tools for hit discovery and optimization. The main motivation of this thesis is to improve and develop new methods in the area of computer-based drug discovery in order to study challenging targets. Specifically, this thesis is focused on docking and Virtual Screening (VS) methodologies to be able to exploit non-standard sites, like protein-protein interfaces or allosteric sites, and discover bioactive molecules with novel mechanisms of action. First, I developed an automatic pipeline for binding mode prediction that applies knowledge- based restraints and validated the approach by participating in the CELPP Challenge, a blind pose prediction challenge. The aim of the first VS in this thesis is to find small molecules able to not only disrupt the RANK-RANKL interaction but also inhibit the constitutive activation of the receptor. With a combination of computational, biophysical, and cell-based assays we were able to identify the first small molecule binders for RANK that could be used as a treatment for Triple Negative Breast Cancer. When working with challenging targets, or with non-standard mechanisms of action, the relationship between binding and the biological response is unpredictable, because the biological response (if any) will depend on the biological function of the particular allosteric site, which is generally unknown. For this reason, we then tested the applicability of the combination of ultrahigh-throughput VS with low-throughput high content assay. This allowed us to characterize a novel allosteric pocket in PTEN and also describe the first allosteric modulators for this protein. Finally, as the accessible Chemical Space grows at a rapid pace, we developed an algorithm to efficiently explore ultra-large Chemical Collections using a Bottom-up approach. We prospectively validated the approach in BRD4 and identified novel BRD4 inhibitors with an affinity comparable to advanced drug candidates for this target.[spa] Els avenços en biologia estructural i molecular han afavorit el desenvolupament racional de nous fàrmacs a través del disseny de fàrmacs basat en l'estructura (SBDD). En particular, les eines computacionals han demostrat ser ràpides i eficients per al descobriment i l'optimització de fàrmacs. La principal motivació d'aquesta tesi és millorar i desenvolupar nous mètodes en l'àrea del descobriment de fàrmacs per ordinador per tal d'estudiar proteïnes complexes. Concretament, aquesta tesi se centra en les metodologies d'acoblament i de cribratge virtual (CV) per poder explotar llocs no estàndard, com interfícies proteïna-proteïna o llocs al·lostèrics, i descobrir molècules bioactives amb nous mecanismes d'acció. En primer lloc, vaig desenvolupar un protocol automàtic per a la predicció del mode d’unió aplicant restriccions basades en el coneixement i vaig validar l'enfocament participant en el repte CELPP, un repte de predicció del mode d’unió a cegues. L'objectiu del primer CV d'aquesta tesi és trobar petites molècules capaces no només d'interrompre la interacció RANK-RANKL sinó també d'inhibir l'activació constitutiva del receptor. Amb una combinació d'assajos computacionals, biofísics i basats en cèl·lules, vam poder identificar les primeres molècules petites per a RANK que es podrien utilitzar com a tractament per al càncer de mama triple negatiu. Quan es treballa amb proteïnes complexes, o amb mecanismes d'acció no estàndard, la relació entre la unió i la resposta biològica és impredictible, perquè la resposta biològica (si n'hi ha) dependrà de la funció biològica del lloc al·lostèric particular, que generalment és desconeguda. Per aquest motiu, després vam provar l'aplicabilitat de la combinació de CV d'alt rendiment amb assaig de contingut alt de baix rendiment. Això ens va permetre caracteritzar un nou lloc d’unió al·lostèric en PTEN i també descriure els primers moduladors al·lostèrics d'aquesta proteïna. Finalment, a mesura que l'espai químic accessible creix a un ritme ràpid, hem desenvolupat un algorisme per explorar de manera eficient col·leccions de productes químics molt grans mitjançant un enfocament de baix a dalt. Vam validar aquest enfocament amb BRD4 i vam identificar nous inhibidors de BRD4 amb una afinitat comparable als candidats a fàrmacs més avançats per a aquesta proteïna

Tesis Doctorals en Xarxa

Solvation Thermodynamic Mapping in Computer Aided Drug Design

Author: Ramsey Steven
Publication venue: CUNY Academic Works
Publication date: 01/02/2019
Field of study

The displacement of water from surfaces upon biomolecular recognition and association makes a significant contribution to the free energy changes of these processes. We therefore posit that accurate characterization of local structural and thermodynamic molecular water properties can improve computational model accuracy and predictivity of recognition and association processes. In this thesis, we discuss Solvation Thermodynamic Mapping (STM) methods that we have developed using inhomogeneous fluid solvation theory (IST) to better characterize active site water structural and thermodynamic properties on protein surfaces and the open source tools that we have developed, GIST-CPPTRAJ and SSTMap, which implement these methods which we have distributed to both the academic and industrial scientific community. These methods include a nearest neighbor approximation for water entropies, a significant improvement over previous entropy formulations. We then discuss our application of these tools to the rational modification of (-)-stepholidine, a lead compound for human dopamine receptor 3 (D3R), which led to a handful of promising analogues. Finally, we describe a new method of creating pharmacophores from solvation thermodynamic maps applied retrospectively to 7 protein targets. The results documented here demonstrate promising applications of STM methods for prospective drug design. In our conclusions, we discuss potential improvements to the molecular modeling work with the goal of improving accuracy of predictions in prospective drug design projects

City University of New York

Technological developments in Virtual Screening for the discovery of small molecules with novel mechanisms of action

Author: Miñarro Lleonar Marina
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 24/02/2023
Field of study

[eng] Advances in structural and molecular biology have favoured the rational development of novel drugs thru structure-based drug design (SBDD). Particularly, computational tools have proven to be rapid and efficient tools for hit discovery and optimization. The main motivation of this thesis is to improve and develop new methods in the area of computer-based drug discovery in order to study challenging targets. Specifically, this thesis is focused on docking and Virtual Screening (VS) methodologies to be able to exploit non-standard sites, like protein-protein interfaces or allosteric sites, and discover bioactive molecules with novel mechanisms of action. First, I developed an automatic pipeline for binding mode prediction that applies knowledge- based restraints and validated the approach by participating in the CELPP Challenge, a blind pose prediction challenge. The aim of the first VS in this thesis is to find small molecules able to not only disrupt the RANK-RANKL interaction but also inhibit the constitutive activation of the receptor. With a combination of computational, biophysical, and cell-based assays we were able to identify the first small molecule binders for RANK that could be used as a treatment for Triple Negative Breast Cancer. When working with challenging targets, or with non-standard mechanisms of action, the relationship between binding and the biological response is unpredictable, because the biological response (if any) will depend on the biological function of the particular allosteric site, which is generally unknown. For this reason, we then tested the applicability of the combination of ultrahigh-throughput VS with low-throughput high content assay. This allowed us to characterize a novel allosteric pocket in PTEN and also describe the first allosteric modulators for this protein. Finally, as the accessible Chemical Space grows at a rapid pace, we developed an algorithm to efficiently explore ultra-large Chemical Collections using a Bottom-up approach. We prospectively validated the approach in BRD4 and identified novel BRD4 inhibitors with an affinity comparable to advanced drug candidates for this target.[cat] Els avenços en biologia estructural i molecular han afavorit el desenvolupament racional de nous fàrmacs a través del disseny de fàrmacs basat en l'estructura (SBDD). En particular, les eines computacionals han demostrat ser ràpides i eficients per al descobriment i l'optimització de fàrmacs. La principal motivació d'aquesta tesi és millorar i desenvolupar nous mètodes en l'àrea del descobriment de fàrmacs per ordinador per tal d'estudiar proteïnes complexes. Concretament, aquesta tesi se centra en les metodologies d'acoblament i de cribratge virtual (CV) per poder explotar llocs no estàndard, com interfícies proteïna-proteïna o llocs al·lostèrics, i descobrir molècules bioactives amb nous mecanismes d'acció. En primer lloc, vaig desenvolupar un protocol automàtic per a la predicció del mode d’unió aplicant restriccions basades en el coneixement i vaig validar l'enfocament participant en el repte CELPP, un repte de predicció del mode d’unió a cegues. L'objectiu del primer CV d'aquesta tesi és trobar petites molècules capaces no només d'interrompre la interacció RANK-RANKL sinó també d'inhibir l'activació constitutiva del receptor. Amb una combinació d'assajos computacionals, biofísics i basats en cèl·lules, vam poder identificar les primeres molècules petites per a RANK que es podrien utilitzar com a tractament per al càncer de mama triple negatiu. Quan es treballa amb proteïnes complexes, o amb mecanismes d'acció no estàndard, la relació entre la unió i la resposta biològica és impredictible, perquè la resposta biològica (si n'hi ha) dependrà de la funció biològica del lloc al·lostèric particular, que generalment és desconeguda. Per aquest motiu, després vam provar l'aplicabilitat de la combinació de CV d'alt rendiment amb assaig de contingut alt de baix rendiment. Això ens va permetre caracteritzar un nou lloc d’unió al·lostèric en PTEN i també descriure els primers moduladors al·lostèrics d'aquesta proteïna. Finalment, a mesura que l'espai químic accessible creix a un ritme ràpid, hem desenvolupat un algorisme per explorar de manera eficient col·leccions de productes químics molt grans mitjançant un enfocament de baix a dalt. Vam validar aquest enfocament amb BRD4 i vam identificar nous inhibidors de BRD4 amb una afinitat comparable als candidats a fàrmacs més avançats per a aquesta proteïna

Diposit Digital de la Universitat de Barcelona

Molecular docking: Shifting paradigms in drug discovery

Author: Pinzi L.
Rastelli G.
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Molecular docking is an established in silico structure-based method widely used in drug discovery. Docking enables the identification of novel compounds of therapeutic interest, predicting ligand-target interactions at a molecular level, or delineating structure-activity relationships (SAR), without knowing a priori the chemical structure of other target modulators. Although it was originally developed to help understanding the mechanisms of molecular recognition between small and large molecules, uses and applications of docking in drug discovery have heavily changed over the last years. In this review, we describe how molecular docking was firstly applied to assist in drug discovery tasks. Then, we illustrate newer and emergent uses and applications of docking, including prediction of adverse effects, polypharmacology, drug repurposing, and target fishing and profiling, discussing also future applications and further potential of this technique when combined with emergent techniques, such as artificial intelligence

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Molecular dynamics and virtual screening approaches in drug discovery

Author: Jokinen Elmeri
Publication venue: fi=Turun yliopisto|en=University of Turku|
Publication date: 28/10/2022
Field of study

Computer-aided drug discovery (CADD) methods are now routinely used in the preclinical phase of drug development. Powerful high-performance computing facilities and the extremely fast CADD methods constantly scale up the coverage of drug-like chemical space achievable in rational drug development. In this thesis, CADD approaches were applied to address several early-phase drug discovery problems. Namely, small molecule binding site detection on a novel target protein, virtual screening (VS) of molecular databases, and characterization of small molecule interactions with metabolic enzymes were studied. Various CADD methods, including molecular dynamics (MD) simulations in mixed solvents, molecular docking, and binding free energy calculations, were employed. Co-solvent MD simulations detected biologically relevant binding sites and provided guidance for screening potential protein-protein interaction inhibitors for a crucial protein of the SARS-CoV-2. VS with fragment- and negative image-based (F-NIB) models identified three active and structurally novel inhibitors of the putative drug target phosphodiesterase 10A. MD simulations and docking provided detailed insights on the effects of active site structural flexibility and variation on the binding and resultant metabolism of small molecules with the cytochrome P450 enzymes. The results presented in this thesis contribute to the increasing evidence that supports employment and further development of CADD approaches in drug discovery. Ultimately, rational drug development coupled with CADD may enable higher quality drug candidates to the human studies in the future, reducing the risk of financially and temporally costly clinical failure. KEYWORDS: Structure-based drug development, Computer-aided drug discovery (CADD), Molecular dynamics (MD) simulation, Virtual screening (VS), Fragmentand negative image-based (F-NIB) model, Structure-activity relationship (QSAR), Cytochrome P450 ligand binding predictionMolekyylidynamiikka- ja virtuaaliseulontamenetelmät lääkeaine-etsinnässä Tietokoneavusteista lääkeaine-etsintää käytetään nykyisin yleisesti prekliinisessä lääketutkimuksessa. Suurteholaskenta ja äärimmäisen nopeat tietokoneavusteiset lääkeaine-etsintämenetelmät mahdollistavat jatkuvasti kattavamman lääkkeenkaltaisten molekyylien kemiallisen avaruuden seulonnan. Tässä väitöskirjassa tietokonepohjaisia menetelmiä hyödynnettiin lääketutkimuksen prekliiniseen vaiheeseen liittyvissä tyypillisissä tutkimusongelmissa. Työhön kuului pienmolekyylien sitoutumisalueiden tunnistus uuden kohdeproteiinin rakenteesta, molekyylitietokantojen virtuaaliseulonta sekä pienmolekyylien ja metabolian entsyymien välisten vuorovaikutusten tietokonemallinnus. Työssä käytettiin useita tietokoneavusteisen lääkeaine-etsinnän menetelmiä, sisältäen molekyylidynamiikkasimulaatiot (MD-simulaatiot) vaihtuvissa liuottimissa, molekulaarisen telakoinnin ja sitoutumisenergian laskennan. Orgaanisen liuottimen ja veden sekoituksessa tehdyt MD-simulaatiot tunnistivat biologisesti merkittäviä sitoutumisalueita SARS-CoV-2:n tärkeästä proteiinista ja ohjasivat infektioon liittyvän proteiini-proteiinivuorovaikutuksen potentiaalisten estäjien etsintää. Virtuaaliseulonnalla tunnistettiin kolme rakenteellisesti uudenlaista tunnetun lääkekehityskohteen, fosfodiesteraasi 10A:n, estäjää hyödyntäen fragmentti- ja negatiivikuvamalleja. MD-simulaatiot ja telakointi tuottivat yksityiskohtaista tietoa sytokromi P450 entsyymien aktiivisen kohdan rakenteen jouston ja muutosten vaikutuksesta pienmolekyylien sitoutumiseen ja metaboliaan. Tämän väitöskirjan tulokset tukevat kasvavaa todistusaineistoa tietokoneavusteisen lääkeaine-etsinnän käytön ja kehityksen hyödyllisyydestä prekliinisessä lääketutkimuksessa. Tietokoneavusteinen lääkeaine-etsintä voi lopulta mahdollistaa korkeampilaatuisten lääkekandidaattien päätymisen ihmiskokeisiin, pienentäen taloudellisesti ja ajallisesti kalliin kliinisen tutkimuksen epäonnistumisen riskiä. AVAINSANAT: Rakennepohjainen lääkeainekehitys, Tietokoneavusteinen lääkeaine-etsintä, Molekyylidynamiikkasimulaatio (MD-simulaatio), Virtuaaliseulonta, Fragmentti- ja negatiivikuvamalli, Rakenne-aktiivisuussuhde, Sytokromi P450 ligandien sitoutumisen ennustu

UTUPub

Modeling and disrupting protein interactions

Author: Pabon Nicolas
Publication venue
Publication date: 26/07/2018
Field of study

Rational drug design requires a deep understanding of protein interactions, both in terms of the structural mechanisms that regulate binding of individual molecules and the broader, pathway- and cell-level effects of disrupting protein interaction networks. This thesis presents work that stems from these ideas, discussing contributions to a number of current challenges in the field of drug discovery. First, we describe how structural flexibility is leveraged by ‘selectively promiscuous’ protein interfaces – enabling them to bind specifically to several distinctly shaped ligands. Taking PD-1 as a case study, we demonstrate using molecular dynamics simulations how the flexible receptor interface recognizes conserved ‘trigger’ motifs on its cognate ligands’ interfaces. Trigger interactions, which we show are also exploited by a recent blockbuster PD-1 inhibitor, drive the initial steps of an induced-fit binding pathway that then ‘splits’ into distinct, ligand-specific bound states. Second, we present a hybrid genomic and structural pipeline for genome-scale identification of protein targets for bioactive compounds. We train a random forest classifier to predict compound-target interactions from compound treatment and gene knockdown gene expression signatures in multiple cell types. Refining genomic predictions with a structure-based screen, we achieve top-10/top-100 target prediction accuracies of 26%/41%, respectively, on a validation set of 152 FDA-approved drugs, and validate previously unknown small molecule modulators of HRAS, KRAS, CHIP, and PDK1. Third, we present a strategy that combines transcriptomic and structural analyses with single-cell microscopy to predict small molecule inhibitors of TNF-induced NF-kB signaling and elucidate the network response. Validating two novel pathway inhibitors that disrupt the protein network upstream of IKK and NF-kB, our findings suggest that a network-centric drug discovery approach is a promising strategy to evaluate the impact of pharmacologic intervention in signaling. Last, we introduce DrugQuery (DQ), a structure-based public web server for small molecule target prediction. DQ docks user-submitted small molecules against a target library of 7957 predicted binding sites on 1245 human proteins. The server achieved a top-decile target prediction accuracy of 68% on a validation set of 95 FDA-approved drugs and 86% on a validation set of 102 FXR-binding compounds from the 2017 D3R Grand Challenge 2

D-Scholarship@Pitt

Application of computer-aided drug design for identification of P. falciparum inhibitors

Author: Diallo Bakary N’tji
Publication venue: 'Rhodes University'
Publication date: 29/10/2021
Field of study

Malaria is a millennia-old disease with the first recorded cases dating back to 2700 BC found in Chinese medical records, and later in other civilizations. It has claimed human lives to such an extent that there are a notable associated socio-economic consequences. Currently, according to the World Health Organization (WHO), Africa holds the highest disease burden with 94% of deaths and 82% of cases with P. falciparum having ~100% prevalence. Chemotherapy, such as artemisinin combination therapy, has been and continues to be the work horse in the fight against the disease, together with seasonal malaria chemoprevention and the use of insecticides. Natural products such as quinine and artemisinin are particularly important in terms of their antimalarial activity. The emphasis in current chemotherapy research is the need for time and cost-effective workflows focussed on new mechanisms of action (MoAs) covering the target candidate profiles (TCPs). Despite a decline in cases over the past decades with, countries increasingly becoming certified malaria free, a stalling trend has been observed in the past five years resulting in missing the 2020 Global Technical Strategy (GTS) milestones. With no effective vaccine, a reduction in funding, slower drug approval than resistance emergence from resistant and invasive vectors, and threats in diagnosis with the pfhrp2/3 gene deletion, malaria remains a major health concern. Motivated by these reasons, the primary aim of this work was a contribution to the antimalarial pipeline through in silico approaches focusing on P. falciparum. We first intended an exploration of malarial targets through a proteome scale screening on 36 targets using multiple metrics to account for the multi-objective nature of drug discovery. The continuous growth of structural data offers the ideal scenario for mining new MoAs covering antimalarials TCPs. This was combined with a repurposing strategy using a set of orally available FDA approved drugs. Further, use was made of time- and cost-effective strategies combining QVina-W efficiency metrics that integrate molecular properties, GRIM rescoring for molecular interactions and a hydrogen mass repartitioning (HMR) molecular dynamics (MD) scheme for accelerated development of antimalarials in the context of resistance. This pipeline further integrates a complex ranking for better drug-target selectivity, and normalization strategies to overcome docking scoring function bias. The different metrics, ranking, normalization strategies and their combinations were first assessed using their mean ranking error (MRE). A version combining all metrics was used to select 36 unique protein-ligand complexes, assessed in MD, with the final retention of 25. From the 16 in vitro tested hits of the 25, fingolimod, abiraterone, prazosin, and terazosin showed antiplasmodial activity with IC50 2.21, 3.37, 16.67 and 34.72 μM respectively and of these, only fingolimod was found to be not safe with respect to human cell viability. These compounds were predicted active on different molecular targets, abiraterone was predicted to interact with a putative liver-stage essential target, hence promising as a transmission-blocking agent. The pipeline had a promising 25% hit rate considering the proteome-scale and use of cost-effective approaches. Secondly, we focused on Plasmodium falciparum 1-deoxy-D-xylulose-5-phosphate reductoisomerase (PfDXR) using a more extensive screening pipeline to overcome some of the current in silico screening limitations. Starting from the ZINC lead-like library of ~3M, hierarchical ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS) approaches with molecular docking and re-scoring using eleven scoring functions (SFs) were used. Later ranking with an exponential consensus strategy was included. Selected hits were further assessed through Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA), advanced MD sampling in a ligand pulling simulations and (Weighted Histogram Analysis Method) WHAM analysis for umbrella sampling (US) to derive binding free energies. Four leads had better predicted affinities in US than LC5, a 280 nM potent PfDXR inhibitor with ZINC000050633276 showing a promising binding of -20.43 kcal/mol. As shown with fosmidomycin, DXR inhibition offers fast acting compounds fulfilling antimalarials TCP1. Yet, fosmidomycin has a high polarity causing its short half-life and hampering its clinical use. These leads scaffolds are different from fosmidomycin and hence may offer better pharmacokinetic and pharmacodynamic properties and may also be promising for lead optimization. A combined analysis of residues’ contributions to the free energy of binding in MM-PBSA and to steered molecular dynamics (SMD) Fmax indicated GLU233, CYS268, SER270, TRP296, and HIS341 as exploitable for compound optimization. Finally, we updated the SANCDB library with new NPs and their commercially available analogs as a solution to NP availability. The library is extended to 1005 compounds from its initial 600 compounds and the database is integrated to Mcule and Molport APIs for analogs automatic update. The new set may contribute to virtual screening and to antimalarials as the most effective ones have NP origin.Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 202

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)

Machine Learning Small Molecule Properties in Drug Discovery

Author: Arroniz Carlos
De Fabritiis Gianni
Majewski Maciej
Schapin Nikolai
Varela Alejandro
Publication venue
Publication date: 02/08/2023
Field of study

Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.Comment: 46 pages, 1 figur

arXiv.org e-Print Archive