29 research outputs found
Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations
Computer-based de-novo design of functional molecules is one of the most
prominent challenges in cheminformatics today. As a result, generative and
evolutionary inverse designs from the field of artificial intelligence have
emerged at a rapid pace, with aims to optimize molecules for a particular
chemical property. These models 'indirectly' explore the chemical space; by
learning latent spaces, policies, distributions or by applying mutations on
populations of molecules. However, the recent development of the SELFIES string
representation of molecules, a surjective alternative to SMILES, have made
possible other potential techniques. Based on SELFIES, we therefore propose
PASITHEA, a direct gradient-based molecule optimization that applies
inceptionism techniques from computer vision. PASITHEA exploits the use of
gradients by directly reversing the learning process of a neural network, which
is trained to predict real-valued chemical properties. Effectively, this forms
an inverse regression model, which is capable of generating molecular variants
optimized for a certain property. Although our results are preliminary, we
observe a shift in distribution of a chosen property during inverse-training, a
clear indication of PASITHEA's viability. A striking property of inceptionism
is that we can directly probe the model's understanding of the chemical space
it was trained on. We expect that extending PASITHEA to larger datasets,
molecules and more complex properties will lead to advances in the design of
new functional molecules as well as the interpretation and explanation of
machine learning models.Comment: 9 pages, 6 figures; comments welcom
Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation
The discovery of novel materials and functional molecules can help to solve
some of society's most urgent challenges, ranging from efficient energy
harvesting and storage to uncovering novel pharmaceutical drug candidates.
Traditionally matter engineering -- generally denoted as inverse design -- was
based massively on human intuition and high-throughput virtual screening. The
last few years have seen the emergence of significant interest in
computer-inspired designs based on evolutionary or deep learning methods. The
major challenge here is that the standard strings molecular representation
SMILES shows substantial weaknesses in that task because large fractions of
strings do not correspond to valid molecules. Here, we solve this problem at a
fundamental level and introduce SELFIES (SELF-referencIng Embedded Strings), a
string-based representation of molecules which is 100\% robust. Every SELFIES
string corresponds to a valid molecule, and SELFIES can represent every
molecule. SELFIES can be directly applied in arbitrary machine learning models
without the adaptation of the models; each of the generated molecule candidates
is valid. In our experiments, the model's internal memory stores two orders of
magnitude more diverse molecules than a similar test with SMILES. Furthermore,
as all molecules are valid, it allows for explanation and interpretation of the
internal working of the generative models.Comment: 6+3 pages, 6+1 figure
CORE: Automatic Molecule Optimization Using Copy & Refine Strategy
Molecule optimization is about generating molecule with more desirable
properties based on an input molecule . The state-of-the-art approaches
partition the molecules into a large set of substructures and grow the new
molecule structure by iteratively predicting which substructure from to
add. However, since the set of available substructures is large, such an
iterative prediction task is often inaccurate especially for substructures that
are infrequent in the training data. To address this challenge, we propose a
new generating strategy called "Copy & Refine" (CORE), where at each step the
generator first decides whether to copy an existing substructure from input
or to generate a new substructure, then the most promising substructure will be
added to the new molecule. Combining together with scaffolding tree generation
and adversarial training, CORE can significantly improve several latest
molecule optimization methods in various measures including drug likeness
(QED), dopamine receptor (DRD2) and penalized LogP. We tested CORE and
baselines using the ZINC database and CORE obtained up to 11% and 21%
relatively improvement over the baselines on success rate on the complete test
set and the subset with infrequent substructures, respectively.Comment: Accepted by AAAI 202
Gravity-Inspired Graph Autoencoders for Directed Link Prediction
Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged
as powerful node embedding methods. In particular, graph AE and VAE were
successfully leveraged to tackle the challenging link prediction problem,
aiming at figuring out whether some pairs of nodes from a graph are connected
by unobserved edges. However, these models focus on undirected graphs and
therefore ignore the potential direction of the link, which is limiting for
numerous real-life applications. In this paper, we extend the graph AE and VAE
frameworks to address link prediction in directed graphs. We present a new
gravity-inspired decoder scheme that can effectively reconstruct directed
graphs from a node embedding. We empirically evaluate our method on three
different directed link prediction tasks, for which standard graph AE and VAE
perform poorly. We achieve competitive results on three real-world graphs,
outperforming several popular baselines.Comment: ACM International Conference on Information and Knowledge Management
(CIKM 2019