441 research outputs found
Application of generative autoencoder in de novo molecular design
A major challenge in computational chemistry is the generation of novel
molecular structures with desirable pharmacological and physiochemical
properties. In this work, we investigate the potential use of autoencoder, a
deep learning methodology, for de novo molecular design. Various generative
autoencoders were used to map molecule structures into a continuous latent
space and vice versa and their performance as structure generator was assessed.
Our results show that the latent space preserves chemical similarity principle
and thus can be used for the generation of analogue structures. Furthermore,
the latent space created by autoencoders were searched systematically to
generate novel compounds with predicted activity against dopamine receptor type
2 and compounds similar to known active compounds not included in the training
set were identified
Constrained Bayesian Optimization for Automatic Chemical Design
Automatic Chemical Design is a framework for generating novel molecules with
optimized properties. The original scheme, featuring Bayesian optimization over
the latent space of a variational autoencoder, suffers from the pathology that
it tends to produce invalid molecular structures. First, we demonstrate
empirically that this pathology arises when the Bayesian optimization scheme
queries latent points far away from the data on which the variational
autoencoder has been trained. Secondly, by reformulating the search procedure
as a constrained Bayesian optimization problem, we show that the effects of
this pathology can be mitigated, yielding marked improvements in the validity
of the generated molecules. We posit that constrained Bayesian optimization is
a good approach for solving this class of training set mismatch in many
generative tasks involving Bayesian optimization over the latent space of a
variational autoencoder.Comment: Previous versions accepted to the NIPS 2017 Workshop on Bayesian
Optimization (BayesOpt 2017) and the NIPS 2017 Workshop on Machine Learning
for Molecules and Material
Deep learning for molecular design - a review of the state of the art
In the space of only a few years, deep generative modeling has revolutionized
how we think of artificial creativity, yielding autonomous systems which
produce original images, music, and text. Inspired by these successes,
researchers are now applying deep generative modeling techniques to the
generation and optimization of molecules - in our review we found 45 papers on
the subject published in the past two years. These works point to a future
where such systems will be used to generate lead molecules, greatly reducing
resources spent downstream synthesizing and characterizing bad leads in the
lab. In this review we survey the increasingly complex landscape of models and
representation schemes that have been proposed. The four classes of techniques
we describe are recursive neural networks, autoencoders, generative adversarial
networks, and reinforcement learning. After first discussing some of the
mathematical fundamentals of each technique, we draw high level connections and
comparisons with other techniques and expose the pros and cons of each. Several
important high level themes emerge as a result of this work, including the
shift away from the SMILES string representation of molecules towards more
sophisticated representations such as graph grammars and 3D representations,
the importance of reward function design, the need for better standards for
benchmarking and testing, and the benefits of adversarial training and
reinforcement learning over maximum likelihood based training.Comment: 24 pages, new title, published in RSC MSD
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Generative models are becoming the tools of choice for the discovery of new
molecules and materials. Such models are able to learn on a large collection of
molecular structures and produce novel compounds. In this work, we introduce a
benchmarking platform called Molecular Sets (MOSES) to support research on
generative models for drug discovery. MOSES provides training and testing
datasets and a set of metrics to evaluate the quality and diversity of
generated structures. We have implemented and compared several molecular
generation models and suggest to use our results as reference points for
further advancements in generative chemistry research. The platform and source
code are available at https://github.com/molecularsets/moses
Optimization of Molecules via Deep Reinforcement Learning
We present a framework, which we call Molecule Deep -Networks (MolDQN),
for molecule optimization by combining domain knowledge of chemistry and
state-of-the-art reinforcement learning techniques (double -learning and
randomized value functions). We directly define modifications on molecules,
thereby ensuring 100\% chemical validity. Further, we operate without
pre-training on any dataset to avoid possible bias from the choice of that set.
Inspired by problems faced during medicinal chemistry lead optimization, we
extend our model with multi-objective reinforcement learning, which maximizes
drug-likeness while maintaining similarity to the original molecule. We further
show the path through chemical space to achieve optimization for a molecule to
understand how the model works
Adversarial Learned Molecular Graph Inference and Generation
Recent methods for generating novel molecules use graph representations of
molecules and employ various forms of graph convolutional neural networks for
inference. However, training requires solving an expensive graph isomorphism
problem, which previous approaches do not address or solve only approximately.
In this work, we propose ALMGIG, a likelihood-free adversarial learning
framework for inference and de novo molecule generation that avoids explicitly
computing a reconstruction loss. Our approach extends generative adversarial
networks by including an adversarial cycle-consistency loss to implicitly
enforce the reconstruction property. To capture properties unique to molecules,
such as valence, we extend the Graph Isomorphism Network to multi-graphs. To
quantify the performance of models, we propose to compute the distance between
distributions of physicochemical properties with the 1-Wasserstein distance. We
demonstrate that ALMGIG more accurately learns the distribution over the space
of molecules than all baselines. Moreover, it can be utilized for drug
discovery by efficiently searching the space of molecules using molecules'
continuous latent representation. Our code is available at
https://github.com/ai-med/almgigComment: Accepted at The European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases (ECML PKDD); Code
at https://github.com/ai-med/almgi
A Classification Scheme for Inverse Design of Molecules: from Targeted Electronic Properties to Atomicity
In machine learning and molecular design, there exist two approaches:
discriminative and generative. In the discriminative approach dubbed forward
design, the goal is to map a set of features/molecules to their respective
electronics properties. In the generative approach dubbed inverse design, a set
of electronics properties is given and the goal is to find the
features/molecules that have these properties. These tasks are very challenging
because the chemical compound space is very large. In this study, we explore a
new scheme for the inverse design of molecules based on a classification
paradigm that takes as input the targeted electronic properties and output the
atomic composition of the molecules (i.e. atomicity or atom counts of each type
in a molecule). To test this new hypothesis, we analyzed the quantum mechanics
QM7b dataset consisting of 7211 small organic molecules and 14 electronic
properties. Results obtained using twenty three different classification
approaches including a regularized Bayesian neural network show that it is
possible to achieve detection/prediction accuracy > 90%.Comment: Under review ICANN 2019. arXiv admin note: text overlap with
arXiv:1904.1032
PepCVAE: Semi-Supervised Targeted Design of Antimicrobial Peptide Sequences
Given the emerging global threat of antimicrobial resistance, new methods for
next-generation antimicrobial design are urgently needed. We report a peptide
generation framework PepCVAE, based on a semi-supervised variational
autoencoder (VAE) model, for designing novel antimicrobial peptide (AMP)
sequences. Our model learns a rich latent space of the biological peptide
context by taking advantage of abundant, unlabeled peptide sequences. The model
further learns a disentangled antimicrobial attribute space by using the
feedback from a jointly trained AMP classifier that uses limited labeled
instances. The disentangled representation allows for controllable generation
of AMPs. Extensive analysis of the PepCVAE-generated sequences reveals superior
performance of our model in comparison to a plain VAE, as PepCVAE generates
novel AMP sequences with higher long-range diversity, while being closer to the
training distribution of biological peptides. These features are highly desired
in next-generation antimicrobial design
Multi-Objective De Novo Drug Design with Conditional Graph Generative Model
Recently, deep generative models have revealed itself as a promising way of
performing de novo molecule design. However, previous research has focused
mainly on generating SMILES strings instead of molecular graphs. Although
current graph generative models are available, they are often too general and
computationally expensive, which restricts their application to molecules with
small sizes. In this work, a new de novo molecular design framework is proposed
based on a type sequential graph generators that do not use atom level
recurrent units. Compared with previous graph generative models, the proposed
method is much more tuned for molecule generation and have been scaled up to
cover significantly larger molecules in the ChEMBL database. It is shown that
the graph-based model outperforms SMILES based models in a variety of metrics,
especially in the rate of valid outputs. For the application of drug design
tasks, conditional graph generative model is employed. This method offers
higher flexibility compared to previous fine-tuning based approach and is
suitable for generation based on multiple objectives. This approach is applied
to solve several drug design problems, including the generation of compounds
containing a given scaffold, generation of compounds with specific
drug-likeness and synthetic accessibility requirements, as well as generating
dual inhibitors against JNK3 and GSK3. Results show high enrichment
rates for outputs satisfying the given requirements
Latent Molecular Optimization for Targeted Therapeutic Design
We devise an approach for targeted molecular design, a problem of interest in
computational drug discovery: given a target protein site, we wish to generate
a chemical with both high binding affinity to the target and satisfactory
pharmacological properties. This problem is made difficult by the enormity and
discreteness of the space of potential therapeutics, as well as the
graph-structured nature of biomolecular surface sites. Using a dataset of
protein-ligand complexes, we surmount these issues by extracting a signature of
the target site with a graph convolutional network and by encoding the discrete
chemical into a continuous latent vector space. The latter embedding permits
gradient-based optimization in molecular space, which we perform using learned
differentiable models of binding affinity and other pharmacological properties.
We show that our approach is able to efficiently optimize these multiple
objectives and discover new molecules with potentially useful binding
properties, validated via docking methods
- …