29,743 research outputs found
A Systematic Survey on Deep Generative Models for Graph Generation
Graphs are important data representations for describing objects and their
relationships, which appear in a wide diversity of real-world scenarios. As one
of a critical problem in this area, graph generation considers learning the
distributions of given graphs and generating more novel graphs. Owing to its
wide range of applications, generative models for graphs have a rich history,
which, however, are traditionally hand-crafted and only capable of modeling a
few statistical properties of graphs. Recent advances in deep generative models
for graph generation is an important step towards improving the fidelity of
generated graphs and paves the way for new kinds of applications. This article
provides an extensive overview of the literature in the field of deep
generative models for the graph generation. Firstly, the formal definition of
deep generative models for the graph generation as well as preliminary
knowledge is provided. Secondly, two taxonomies of deep generative models for
unconditional, and conditional graph generation respectively are proposed; the
existing works of each are compared and analyzed. After that, an overview of
the evaluation metrics in this specific domain is provided. Finally, the
applications that deep graph generation enables are summarized and five
promising future research directions are highlighted
MolGAN: An implicit generative model for small molecular graphs
Deep generative models for graph-structured data offer a new angle on the
problem of chemical synthesis: by optimizing differentiable models that
directly generate molecular graphs, it is possible to side-step expensive
search procedures in the discrete and vast space of chemical structures. We
introduce MolGAN, an implicit, likelihood-free generative model for small
molecular graphs that circumvents the need for expensive graph matching
procedures or node ordering heuristics of previous likelihood-based methods.
Our method adapts generative adversarial networks (GANs) to operate directly on
graph-structured data. We combine our approach with a reinforcement learning
objective to encourage the generation of molecules with specific desired
chemical properties. In experiments on the QM9 chemical database, we
demonstrate that our model is capable of generating close to 100% valid
compounds. MolGAN compares favorably both to recent proposals that use
string-based (SMILES) representations of molecules and to a likelihood-based
method that directly generates graphs, albeit being susceptible to mode
collapse.Comment: 11 pages, 3 figures, 3 table
MoFlow: An Invertible Flow Model for Generating Molecular Graphs
Generating molecular graphs with desired chemical properties driven by deep
graph generative models provides a very promising way to accelerate drug
discovery process. Such graph generative models usually consist of two steps:
learning latent representations and generation of molecular graphs. However, to
generate novel and chemically-valid molecular graphs from latent
representations is very challenging because of the chemical constraints and
combinatorial complexity of molecular graphs. In this paper, we propose MoFlow,
a flow-based graph generative model to learn invertible mappings between
molecular graphs and their latent representations. To generate molecular
graphs, our MoFlow first generates bonds (edges) through a Glow based model,
then generates atoms (nodes) given bonds by a novel graph conditional flow, and
finally assembles them into a chemically valid molecular graph with a posthoc
validity correction. Our MoFlow has merits including exact and tractable
likelihood training, efficient one-pass embedding and generation, chemical
validity guarantees, 100\% reconstruction of training data, and good
generalization ability. We validate our model by four tasks: molecular graph
generation and reconstruction, visualization of the continuous latent space,
property optimization, and constrained property optimization. Our MoFlow
achieves state-of-the-art performance, which implies its potential efficiency
and effectiveness to explore large chemical space for drug discovery
Keeping it Simple: Language Models can learn Complex Molecular Distributions
Deep generative models of molecules have grown immensely in popularity,
trained on relevant datasets, these models are used to search through chemical
space. The downstream utility of generative models for the inverse design of
novel functional compounds depends on their ability to learn a training
distribution of molecules. The most simple example is a language model that
takes the form of a recurrent neural network and generates molecules using a
string representation. More sophisticated are graph generative models, which
sequentially construct molecular graphs and typically achieve state of the art
results. However, recent work has shown that language models are more capable
than once thought, particularly in the low data regime. In this work, we
investigate the capacity of simple language models to learn distributions of
molecules. For this purpose, we introduce several challenging generative
modeling tasks by compiling especially complex distributions of molecules. On
each task, we evaluate the ability of language models as compared with two
widely used graph generative models. The results demonstrate that language
models are powerful generative models, capable of adeptly learning complex
molecular distributions -- and yield better performance than the graph models.
Language models can accurately generate: distributions of the highest scoring
penalized LogP molecules in ZINC15, multi-modal molecular distributions as well
as the largest molecules in PubChem
A hierarchy of recurrent networks for speech recognition
Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the Temporal Reservoir Machine (TRM). It utilizes a recurrent artificial neural network (ANN) for integrating information from the input over
time. This information is then fed into a RBM at each time step. To avoid difficulties of recurrent network learning, the ANN remains untrained and hence can be thought of as a random feature extractor. Using the architecture of multi-layer RBMs (Deep Belief Networks), the TRMs can be used as a building block for complex hierarchical models. This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification. The TRM is tested on a spoken digits task under noisy conditions, and competitive performances compared to previous models are observed
- …