34 research outputs found
Bayesian optimization with known experimental and design constraints for chemistry applications
Optimization strategies driven by machine learning, such as Bayesian
optimization, are being explored across experimental sciences as an efficient
alternative to traditional design of experiment. When combined with automated
laboratory hardware and high-performance computing, these strategies enable
next-generation platforms for autonomous experimentation. However, the
practical application of these approaches is hampered by a lack of flexible
software and algorithms tailored to the unique requirements of chemical
research. One such aspect is the pervasive presence of constraints in the
experimental conditions when optimizing chemical processes or protocols, and in
the chemical space that is accessible when designing functional molecules or
materials. Although many of these constraints are known a priori, they can be
interdependent, non-linear, and result in non-compact optimization domains. In
this work, we extend our experiment planning algorithms Phoenics and Gryffin
such that they can handle arbitrary known constraints via an intuitive and
flexible interface. We benchmark these extended algorithms on continuous and
discrete test functions with a diverse set of constraints, demonstrating their
flexibility and robustness. In addition, we illustrate their practical utility
in two simulated chemical research scenarios: the optimization of the synthesis
of o-xylenyl Buckminsterfullerene adducts under constrained flow conditions,
and the design of redox active molecules for flow batteries under synthetic
accessibility constraints. The tools developed constitute a simple, yet
versatile strategy to enable model-based optimization with known experimental
constraints, contributing to its applicability as a core component of
autonomous platforms for scientific discovery.Comment: 15 pages, 5 figures (SI with 13 pages, 8 figures
Roughness of molecular property landscapes and its impact on modellability
In molecular discovery and drug design, structure-property relationships and
activity landscapes are often qualitatively or quantitatively analyzed to guide
the navigation of chemical space. The roughness (or smoothness) of these
molecular property landscapes is one of their most studied geometric
attributes, as it can characterize the presence of activity cliffs, with
rougher landscapes generally expected to pose tougher optimization challenges.
Here, we introduce a general, quantitative measure for describing the roughness
of molecular property landscapes. The proposed roughness index (ROGI) is
loosely inspired by the concept of fractal dimension and strongly correlates
with the out-of-sample error achieved by machine learning models on numerous
regression tasks.Comment: 17 pages, 6 figures, 2 tables (SI with 17 pages, 16 figures
On scientific understanding with artificial intelligence
Imagine an oracle that correctly predicts the outcome of every particle
physics experiment, the products of every chemical reaction, or the function of
every protein. Such an oracle would revolutionize science and technology as we
know them. However, as scientists, we would not be satisfied with the oracle
itself. We want more. We want to comprehend how the oracle conceived these
predictions. This feat, denoted as scientific understanding, has frequently
been recognized as the essential aim of science. Now, the ever-growing power of
computers and artificial intelligence poses one ultimate question: How can
advanced artificial systems contribute to scientific understanding or achieve
it autonomously?
We are convinced that this is not a mere technical question but lies at the
core of science. Therefore, here we set out to answer where we are and where we
can go from here. We first seek advice from the philosophy of science to
understand scientific understanding. Then we review the current state of the
art, both from literature and by collecting dozens of anecdotes from scientists
about how they acquired new conceptual understanding with the help of
computers. Those combined insights help us to define three dimensions of
android-assisted scientific understanding: The android as a I) computational
microscope, II) resource of inspiration and the ultimate, not yet existent III)
agent of understanding. For each dimension, we explain new avenues to push
beyond the status quo and unleash the full power of artificial intelligence's
contribution to the central aim of science. We hope our perspective inspires
and focuses research towards androids that get new scientific understanding and
ultimately bring us closer to true artificial scientists.Comment: 13 pages, 3 figures, comments welcome
Using the fragment molecular orbital method to investigate agonist–orexin-2 receptor interactions
The understanding of binding interactions between any protein and a small molecule plays a key role in the rationalization of affinity and selectivity and is essential for an efficient structure-based drug discovery (SBDD) process. Clearly, to begin SBDD, a structure is needed, and although there has been fantastic progress in solving G-protein-coupled receptor (GPCR) crystal structures, the process remains quite slow and is not currently feasible for every GPCR or GPCR-ligand complex. This situation significantly limits the ability of X-ray crystallography to impact the drug discovery process for GPCR targets in 'real-time' and hence there is still a need for other practical and cost-efficient alternatives. We present here an approach that integrates our previously described hierarchical GPCR modelling protocol (HGMP) and the fragment molecular orbital (FMO) quantum mechanics (QM) method to explore the interactions and selectivity of the human orexin-2 receptor (OX2R) and its recently discovered nonpeptidic agonists. HGMP generates a 3D model of GPCR structures and its complexes with small molecules by applying a set of computational methods. FMO allowsab initioapproaches to be applied to systems that conventional QM methods would find challenging. The key advantage of FMO is that it can reveal information on the individual contribution and chemical nature of each residue and water molecule to the ligand binding that normally would be difficult to detect without QM. We illustrate how the combination of both techniques provides a practical and efficient approach that can be used to analyse the existing structure-function relationships (SAR) and to drive forward SBDD in a real-world example for which there is no crystal structure of the complex available
Free energy calculations in drug design: application to bromodomains
Computer simulations of biomolecules have been improving at a pace that is faster than Moore’s law for microprocessors in the last few decades. Thanks to advances in theory, hardware, and algorithms it is increasingly possible to study biological processes at relevant spatial and temporal resolutions, and to exploit simulation for quantitative predictions. One area that can potentially benefit greatly from such computational predictions is that of drug discovery. Since the inception of the concept of rational drug design, the prediction of how tightly an organic molecule binds to a macromolecular partner has been one of the chief objectives of computational chemistry. Computers already play a fundamental support role during the drug discovery process, and today many novel approaches that aim at studying the details of drug binding and predicting binding affinity are being actively investigated. In this thesis, I report a series of studies that aim to evaluate the potential utility of free energy calculations based on molecular simulations for drug design. In particular, I focus on the prediction of small-molecule binding affinities to the epigenetic target of bromodomains. Bromodomains are small protein modules that have been found in 46 human proteins involved in gene regulation. Given their role in various diseases, in particular cancer and inflammation, a number of bromodomain inhibitors are currently being investigated both in the laboratory and the clinic. Here, it is shown how thorough calculations based on explicit-solvent simulations and all-atom force fields can accurately reproduce binding free energies for this protein family. Rigorous free energy calculations are also compared to more approximate methods based on the post-processing of the simulation trajectories in implicit solvent. Finally, a recently proposed method for the estimation of water binding free energy is employed to study water displaceability from bromodomain binding pockets.</p
Free energy calculations in drug design: application to bromodomains
Computer simulations of biomolecules have been improving at a pace that is faster
than Mooreâs law for microprocessors in the last few decades. Thanks to advances
in theory, hardware, and algorithms it is increasingly possible to study biological
processes at relevant spatial and temporal resolutions, and to exploit simulation
for quantitative predictions. One area that can potentially benefit greatly from
such computational predictions is that of drug discovery. Since the inception
of the concept of rational drug design, the prediction of how tightly an organic
molecule binds to a macromolecular partner has been one of the chief objectives
of computational chemistry. Computers already play a fundamental support role
during the drug discovery process, and today many novel approaches that aim
at studying the details of drug binding and predicting binding affinity are being
actively investigated. In this thesis, I report a series of studies that aim to evaluate
the potential utility of free energy calculations based on molecular simulations for
drug design. In particular, I focus on the prediction of small-molecule binding
affinities to the epigenetic target of bromodomains. Bromodomains are small protein
modules that have been found in 46 human proteins involved in gene regulation.
Given their role in various diseases, in particular cancer and inflammation, a
number of bromodomain inhibitors are currently being investigated both in the
laboratory and the clinic. Here, it is shown how thorough calculations based
on explicit-solvent simulations and all-atom force fields can accurately reproduce
binding free energies for this protein family. Rigorous free energy calculations
are also compared to more approximate methods based on the post-processing
of the simulation trajectories in implicit solvent. Finally, a recently proposed
method for the estimation of water binding free energy is employed to study water
displaceability from bromodomain binding pockets.</p
A graph representation of molecular ensembles for polymer property prediction
A graph representation that captures critical features of polymeric materials and an associated graph neural network achieve superior accuracy to off-the-shelf cheminformatics methodologies.</jats:p
Anubis: Bayesian optimization with unknown feasibility constraints for scientific experimentation
Model-based optimization strategies, such as Bayesian optimization (BO), have been deployed across the natural sciences in design and discovery campaigns due to their sample efficiency and flexibility. The combination of such strategies with automated laboratory equipment and/or high-performance computing in a suggest-make-measure closed-loop constitutes a self-driving laboratory (SDL), which have been endorsed as a next-generation technology for autonomous scientific experimentation. Despite the promise of early SDL prototypes, a lack of flexible experiment planning algorithms prevents certain prevalent optimization problem types from being addressed. For instance, many experiment planning algorithms are unable to intelligently deal with failed measurements resulting from a priori unknown constraints on the parameter space. Such constraint functions are pervasive in chemistry and materials science research, stemming from unexpected equipment failures, failed/abandoned syntheses, or unstable molecules or materials. In this work, we provide a comprehensive discussion and benchmark of BO strategies to deal with a priori unknown constraints, characterized by learning the constraint function on-the-fly using a variational Gaussian process classifier and combining its predictions with the typical BO regression surrogate to parameterize feasibility-aware acquisition functions. These acquisition functions balance sampling parameter space regions deemed to be promising in terms of optimization objectives with avoidance of regions predicted to be infeasible. In addition to benchmarking feasibility-aware acquisition functions on analytic optimization benchmark surfaces, we conduct two realistic optimization benchmarks derived from previously reported studies: inverse design of hybrid organic-inorganic halide perovskite materials with unknown stability constraints, and the design of BCR-Abl kinase inhibitors with unknown synthetic accessibility constraints. We deliver intuitive recommendations to readers on which strategies work best for various scenarios. Overall, this work contributes to advancing the practicality and efficiency of autonomous experimentation in SDLs. All strategies introduced in this work are implemented as part of the open-source Atlas Python library