5 research outputs found
Controlled exploration of chemical space by machine learning of coarse-grained representations
The size of chemical compound space is too large to be probed exhaustively.
This leads high-throughput protocols to drastically subsample and results in
sparse and non-uniform datasets. Rather than arbitrarily selecting compounds,
we systematically explore chemical space according to the target property of
interest. We first perform importance sampling by introducing a Markov chain
Monte Carlo scheme across compounds. We then train an ML model on the sampled
data to expand the region of chemical space probed. Our boosting procedure
enhances the number of compounds by a factor 2 to 10, enabled by the ML model's
coarse-grained representation, which both simplifies the structure-property
relationship and reduces the size of chemical space. The ML model correctly
recovers linear relationships between transfer free energies. These linear
relationships correspond to features that are global to the dataset, marking
the region of chemical space up to which predictions are reliable---a more
robust alternative to the predictive variance. Bridging coarse-grained
simulations with ML gives rise to an unprecedented database of drug-membrane
insertion free energies for 1.3 million compounds.Comment: 9 pages, 5 figure
In silico screening of drug-membrane thermodynamics reveals linear relations between bulk partitioning and the potential of mean force
The partitioning of small molecules in cell membranes---a key parameter for
pharmaceutical applications---typically relies on experimentally-available bulk
partitioning coefficients. Computer simulations provide a structural resolution
of the insertion thermodynamics via the potential of mean force, but require
significant sampling at the atomistic level. Here, we introduce high-throughput
coarse-grained molecular dynamics simulations to screen thermodynamic
properties. This application of physics based models in a large-scale study of
small molecules establishes linear relationships between partitioning
coefficients and key features of the potential of mean force. This allows us to
predict the structure of the insertion from bulk experimental measurements for
more than 400,000 compounds. The potential of mean force hereby becomes an
easily accessible quantity---already recognized for its high predictability of
certain properties, e.g., passive permeation. Further, we demonstrate how
coarse graining helps reduce the size of chemical space, enabling a
hierarchical approach to screening small molecules.Comment: 8 pages, 6 figures. Typos fixed, minor correction
Broad chemical transferability in structure-based coarse-graining
Compared to top-down coarse-grained (CG) models, bottom-up approaches are
capable of offering higher structural fidelity. This fidelity results from the
tight link to a higher-resolution reference, making the CG model chemically
specific. Unfortunately, chemical specificity can be at odds with
compound-screening strategies, which call for transferable parametrizations.
Here we present an approach to reconcile bottom-up, structure-preserving CG
models with chemical transferability. We consider the bottom-up CG
parametrization of 3,441 CO small-molecule isomers. Our approach
combines atomic representations, unsupervised learning, and a large-scale
extended-ensemble force-matching parametrization. We first identify a subset of
19 representative molecules, which maximally encode the local environment of
all gas-phase conformers. Reference interactions between the 19 representative
molecules were obtained from both homogeneous bulk liquids and various binary
mixtures. An extended-ensemble parametrization over all 703 state points leads
to a CG model that is both structure-based and chemically transferable.
Remarkably, the resulting force field is on average more structurally accurate
than single-state-point equivalents. Averaging over the extended ensemble acts
as a mean-force regularizer, smoothing out both force and structural
correlations that are overly specific to a single state point. Our approach
aims at transferability through a set of CG bead types that can be used to
easily construct new molecules, while retaining the benefits of a
structure-based parametrization.Comment: 15 pages, 7 figure