Over the last decade, there has been significant progress in the field of
machine learning for de novo drug design, particularly in deep generative
models. However, current generative approaches exhibit a significant challenge
as they do not ensure that the proposed molecular structures can be feasibly
synthesized nor do they provide the synthesis routes of the proposed small
molecules, thereby seriously limiting their practical applicability. In this
work, we propose a novel forward synthesis framework powered by reinforcement
learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis
(PGFS), that addresses this challenge by embedding the concept of synthetic
accessibility directly into the de novo drug design system. In this setup, the
agent learns to navigate through the immense synthetically accessible chemical
space by subjecting commercially available small molecule building blocks to
valid chemical reactions at every time step of the iterative virtual multi-step
synthesis process. The proposed environment for drug discovery provides a
highly challenging test-bed for RL algorithms owing to the large state space
and high-dimensional continuous action space with hierarchical actions. PGFS
achieves state-of-the-art performance in generating structures with high QED
and penalized clogP. Moreover, we validate PGFS in an in-silico
proof-of-concept associated with three HIV targets. Finally, we describe how
the end-to-end training conceptualized in this study represents an important
paradigm in radically expanding the synthesizable chemical space and automating
the drug discovery process.Comment: added the statistics of top-100 compounds used logP metric with
scaled components added values of the initial reactants to the box plots some
values in tables are recalculated due to the inconsistent environments on
different machines. corresponding benchmarks were rerun with the requirements
on github. no significant changes in the results. corrected figures in the
Appendi