441 research outputs found

    Application of generative autoencoder in de novo molecular design

    Full text link
    A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the training set were identified

    Constrained Bayesian Optimization for Automatic Chemical Design

    Full text link
    Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this class of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.Comment: Previous versions accepted to the NIPS 2017 Workshop on Bayesian Optimization (BayesOpt 2017) and the NIPS 2017 Workshop on Machine Learning for Molecules and Material

    Deep learning for molecular design - a review of the state of the art

    Full text link
    In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules - in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training.Comment: 24 pages, new title, published in RSC MSD

    Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

    Full text link
    Generative models are becoming the tools of choice for the discovery of new molecules and materials. Such models are able to learn on a large collection of molecular structures and produce novel compounds. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to support research on generative models for drug discovery. MOSES provides training and testing datasets and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses

    Optimization of Molecules via Deep Reinforcement Learning

    Full text link
    We present a framework, which we call Molecule Deep QQ-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double QQ-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100\% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. Inspired by problems faced during medicinal chemistry lead optimization, we extend our model with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. We further show the path through chemical space to achieve optimization for a molecule to understand how the model works

    Adversarial Learned Molecular Graph Inference and Generation

    Full text link
    Recent methods for generating novel molecules use graph representations of molecules and employ various forms of graph convolutional neural networks for inference. However, training requires solving an expensive graph isomorphism problem, which previous approaches do not address or solve only approximately. In this work, we propose ALMGIG, a likelihood-free adversarial learning framework for inference and de novo molecule generation that avoids explicitly computing a reconstruction loss. Our approach extends generative adversarial networks by including an adversarial cycle-consistency loss to implicitly enforce the reconstruction property. To capture properties unique to molecules, such as valence, we extend the Graph Isomorphism Network to multi-graphs. To quantify the performance of models, we propose to compute the distance between distributions of physicochemical properties with the 1-Wasserstein distance. We demonstrate that ALMGIG more accurately learns the distribution over the space of molecules than all baselines. Moreover, it can be utilized for drug discovery by efficiently searching the space of molecules using molecules' continuous latent representation. Our code is available at https://github.com/ai-med/almgigComment: Accepted at The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD); Code at https://github.com/ai-med/almgi

    A Classification Scheme for Inverse Design of Molecules: from Targeted Electronic Properties to Atomicity

    Full text link
    In machine learning and molecular design, there exist two approaches: discriminative and generative. In the discriminative approach dubbed forward design, the goal is to map a set of features/molecules to their respective electronics properties. In the generative approach dubbed inverse design, a set of electronics properties is given and the goal is to find the features/molecules that have these properties. These tasks are very challenging because the chemical compound space is very large. In this study, we explore a new scheme for the inverse design of molecules based on a classification paradigm that takes as input the targeted electronic properties and output the atomic composition of the molecules (i.e. atomicity or atom counts of each type in a molecule). To test this new hypothesis, we analyzed the quantum mechanics QM7b dataset consisting of 7211 small organic molecules and 14 electronic properties. Results obtained using twenty three different classification approaches including a regularized Bayesian neural network show that it is possible to achieve detection/prediction accuracy > 90%.Comment: Under review ICANN 2019. arXiv admin note: text overlap with arXiv:1904.1032

    PepCVAE: Semi-Supervised Targeted Design of Antimicrobial Peptide Sequences

    Full text link
    Given the emerging global threat of antimicrobial resistance, new methods for next-generation antimicrobial design are urgently needed. We report a peptide generation framework PepCVAE, based on a semi-supervised variational autoencoder (VAE) model, for designing novel antimicrobial peptide (AMP) sequences. Our model learns a rich latent space of the biological peptide context by taking advantage of abundant, unlabeled peptide sequences. The model further learns a disentangled antimicrobial attribute space by using the feedback from a jointly trained AMP classifier that uses limited labeled instances. The disentangled representation allows for controllable generation of AMPs. Extensive analysis of the PepCVAE-generated sequences reveals superior performance of our model in comparison to a plain VAE, as PepCVAE generates novel AMP sequences with higher long-range diversity, while being closer to the training distribution of biological peptides. These features are highly desired in next-generation antimicrobial design

    Multi-Objective De Novo Drug Design with Conditional Graph Generative Model

    Full text link
    Recently, deep generative models have revealed itself as a promising way of performing de novo molecule design. However, previous research has focused mainly on generating SMILES strings instead of molecular graphs. Although current graph generative models are available, they are often too general and computationally expensive, which restricts their application to molecules with small sizes. In this work, a new de novo molecular design framework is proposed based on a type sequential graph generators that do not use atom level recurrent units. Compared with previous graph generative models, the proposed method is much more tuned for molecule generation and have been scaled up to cover significantly larger molecules in the ChEMBL database. It is shown that the graph-based model outperforms SMILES based models in a variety of metrics, especially in the rate of valid outputs. For the application of drug design tasks, conditional graph generative model is employed. This method offers higher flexibility compared to previous fine-tuning based approach and is suitable for generation based on multiple objectives. This approach is applied to solve several drug design problems, including the generation of compounds containing a given scaffold, generation of compounds with specific drug-likeness and synthetic accessibility requirements, as well as generating dual inhibitors against JNK3 and GSK3β\beta. Results show high enrichment rates for outputs satisfying the given requirements

    Latent Molecular Optimization for Targeted Therapeutic Design

    Full text link
    We devise an approach for targeted molecular design, a problem of interest in computational drug discovery: given a target protein site, we wish to generate a chemical with both high binding affinity to the target and satisfactory pharmacological properties. This problem is made difficult by the enormity and discreteness of the space of potential therapeutics, as well as the graph-structured nature of biomolecular surface sites. Using a dataset of protein-ligand complexes, we surmount these issues by extracting a signature of the target site with a graph convolutional network and by encoding the discrete chemical into a continuous latent vector space. The latter embedding permits gradient-based optimization in molecular space, which we perform using learned differentiable models of binding affinity and other pharmacological properties. We show that our approach is able to efficiently optimize these multiple objectives and discover new molecules with potentially useful binding properties, validated via docking methods
    • …
    corecore