7,631 research outputs found

    Fr\'echet ChemNet Distance: A metric for generative models for molecules in drug discovery

    Full text link
    The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, assessing the performance of such generative models is notoriously difficult. Metrics that are typically used to assess the performance of such generative models are the percentage of chemically valid molecules or the similarity to real molecules in terms of particular descriptors, such as the partition coefficient (logP) or druglikeness. However, method comparison is difficult because of the inconsistent use of evaluation metrics, the necessity for multiple metrics, and the fact that some of these measures can easily be tricked by simple rule-based systems. We propose a novel distance measure between two sets of molecules, called Fr\'echet ChemNet distance (FCD), that can be used as an evaluation metric for generative models. The FCD is similar to a recently established performance metric for comparing image generation methods, the Fr\'echet Inception Distance (FID). Whereas the FID uses one of the hidden layers of InceptionNet, the FCD utilizes the penultimate layer of a deep neural network called ChemNet, which was trained to predict drug activities. Thus, the FCD metric takes into account chemically and biologically relevant information about molecules, and also measures the diversity of the set via the distribution of generated molecules. The FCD's advantage over previous metrics is that it can detect if generated molecules are a) diverse and have similar b) chemical and c) biological properties as real molecules. We further provide an easy-to-use implementation that only requires the SMILES representation of the generated molecules as input to calculate the FCD. Implementations are available at: https://www.github.com/bioinf-jku/FCDComment: Implementations are available at: https://www.github.com/bioinf-jku/FC

    The Synthesizability of Molecules Proposed by Generative Models

    Full text link
    The discovery of functional molecules is an expensive and time-consuming process, exemplified by the rising costs of small molecule therapeutic discovery. One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization, catalyzed by the development of new deep learning approaches. These techniques can suggest novel molecular structures intended to maximize a multi-objective function, e.g., suitability as a therapeutic against a particular target, without relying on brute-force exploration of a chemical space. However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program to quantify how often molecules proposed by state-of-the-art generative models cannot be readily synthesized. Our analysis demonstrates that there are several tasks for which these models generate unrealistic molecular structures despite performing well on popular quantitative benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically-tractable chemical space, although doing so necessarily detracts from the primary objective. This analysis suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted

    Computer Aided Synthesis Prediction to Enable Augmented Chemical Discovery and Chemical Space Exploration

    Get PDF
    The drug-like chemical space is estimated to be 10 to the power of 60 molecules, and the largest generated database (GDB) obtained by the Reymond group is 165 billion molecules with up to 17 heavy atoms. Furthermore, deep learning techniques to explore regions of chemical space are becoming more popular. However, the key to realizing the generated structures experimentally lies in chemical synthesis. The application of which was previously limited to manual planning or slow computer assisted synthesis planning (CASP) models. Despite the 60-year history of CASP few synthesis planning tools have been open-sourced to the community. In this thesis I co-led the development of and investigated one of the only fully open-source synthesis planning tools called AiZynthFinder, trained on both public and proprietary datasets consisting of up to 17.5 million reactions. This enables synthesis guided exploration of the chemical space in a high throughput manner, to bridge the gap between compound generation and experimental realisation. I firstly investigate both public and proprietary reaction data, and their influence on route finding capability. Furthermore, I develop metrics for assessment of retrosynthetic prediction, single-step retrosynthesis models, and automated template extraction workflows. This is supplemented by a comparison of the underlying datasets and their corresponding models. Given the prevalence of ring systems in the GDB and wider medicinal chemistry domain, I developed ‘Ring Breaker’ - a data-driven approach to enable the prediction of ring-forming reactions. I demonstrate its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. Additionally, I highlight its potential for incorporation into CASP tools, and outline methodological improvements that result in the improvement of route-finding capability. To tackle the challenge of model throughput, I report a machine learning (ML) based classifier called the retrosynthetic accessibility score (RAscore), to assess the likelihood of finding a synthetic route using AiZynthFinder. The RAscore computes at least 4,500 times faster than AiZynthFinder. Thus, opens the possibility of pre-screening millions of virtual molecules from enumerated databases or generative models for synthesis informed compound prioritization. Finally, I combine chemical library visualization with synthetic route prediction to facilitate experimental engagement with synthetic chemists. I enable the navigation of chemical property space by using interactive visualization to deliver associated synthetic data as endpoints. This aids in the prioritization of compounds. The ability to view synthetic route information alongside structural descriptors facilitates a feedback mechanism for the improvement of CASP tools and enables rapid hypothesis testing. I demonstrate the workflow as applied to the GDB databases to augment compound prioritization and synthetic route design

    Automated in Silico Design of Homogeneous Catalysts

    Get PDF
    Catalyst discovery is increasingly relying on computational chemistry, and many of the computational tools are currently being automated. The state of this automation and the degree to which it may contribute to speeding up development of catalysts are the subject of this Perspective. We also consider the main challenges associated with automated catalyst design, in particular the generation of promising and chemically realistic candidates, the tradeoff between accuracy and cost in estimating the catalytic performance, the opportunities associated with automated generation and use of large amounts of data, and even how to define the objectives of catalyst design. Throughout the Perspective, we take a cross-disciplinary approach and evaluate the potential of methods and experiences from fields other than homogeneous catalysis. Finally, we provide an overview of software packages available for automated in silico design of homogeneous catalysts.publishedVersio

    Advances in De Novo Drug Design : From Conventional to Machine Learning Methods

    Get PDF
    De novo drug design is a computational approach that generates novel molecular structures from atomic building blocks with no a priori relationships. Conventional methods include structure-based and ligand-based design, which depend on the properties of the active site of a biological target or its known active binders, respectively. Artificial intelligence, including ma-chine learning, is an emerging field that has positively impacted the drug discovery process. Deep reinforcement learning is a subdivision of machine learning that combines artificial neural networks with reinforcement-learning architectures. This method has successfully been em-ployed to develop novel de novo drug design approaches using a variety of artificial networks including recurrent neural networks, convolutional neural networks, generative adversarial networks, and autoencoders. This review article summarizes advances in de novo drug design, from conventional growth algorithms to advanced machine-learning methodologies and high-lights hot topics for further development.Peer reviewe

    In silico Strategies to Support Fragment-to-Lead Optimization in Drug Discovery.

    Get PDF
    Fragment-based drug (or lead) discovery (FBDD or FBLD) has developed in the last two decades to become a successful key technology in the pharmaceutical industry for early stage drug discovery and development. The FBDD strategy consists of screening low molecular weight compounds against macromolecular targets (usually proteins) of clinical relevance. These small molecular fragments can bind at one or more sites on the target and act as starting points for the development of lead compounds. In developing the fragments attractive features that can translate into compounds with favorable physical, pharmacokinetics and toxicity (ADMET-absorption, distribution, metabolism, excretion, and toxicity) properties can be integrated. Structure-enabled fragment screening campaigns use a combination of screening by a range of biophysical techniques, such as differential scanning fluorimetry, surface plasmon resonance, and thermophoresis, followed by structural characterization of fragment binding using NMR or X-ray crystallography. Structural characterization is also used in subsequent analysis for growing fragments of selected screening hits. The latest iteration of the FBDD workflow employs a high-throughput methodology of massively parallel screening by X-ray crystallography of individually soaked fragments. In this review we will outline the FBDD strategies and explore a variety of in silico approaches to support the follow-up fragment-to-lead optimization of either: growing, linking, and merging. These fragment expansion strategies include hot spot analysis, druggability prediction, SAR (structure-activity relationships) by catalog methods, application of machine learning/deep learning models for virtual screening and several de novo design methods for proposing synthesizable new compounds. Finally, we will highlight recent case studies in fragment-based drug discovery where in silico methods have successfully contributed to the development of lead compounds

    Lingo3DMol: Generation of a Pocket-based 3D Molecule using a Language Model

    Full text link
    Structure-based drug design powered by deep generative models have attracted increasing research interest in recent years. Language models have demonstrated a robust capacity for generating valid molecules in 2D structures, while methods based on geometric deep learning can directly produce molecules with accurate 3D coordinates. Inspired by both methods, this article proposes a pocket-based 3D molecule generation method that leverages the language model with the ability to generate 3D coordinates. High quality protein-ligand complex data are insufficient; hence, a perturbation and restoration pre-training task is designed that can utilize vast amounts of small-molecule data. A new molecular representation, a fragment-based SMILES with local and global coordinates, is also presented, enabling the language model to learn molecular topological structures and spatial position information effectively. Ultimately, CrossDocked and DUD-E dataset is employed for evaluation and additional metrics are introduced. This method achieves state-of-the-art performance in nearly all metrics, notably in terms of binding patterns, drug-like properties, rational conformations, and inference speed. Our model is available as an online service to academic users via sw3dmg.stonewise.c
    corecore