12 research outputs found
Computational planning of the synthesis of complex natural products
Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years(1-7). However, the field has progressed greatly since the development of early programs such as LHASA(1,7), for which reaction choices at each step were made by human operators. Multiple software platforms(6,8-14) are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary(15,16) and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships(17,18), allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts
Das adaptive Importance Sampling durch Minimierung der SchÀtzer der Kreuzentropie, des mittleren Quadrats, und der Ineffizienz-Konstante
The inefficiency of using an unbiased estimator in a Monte Carlo procedure can
be quantified using an inefficiency constant, equal to the product of the
variance of the estimator and its mean computational cost. We develop methods
for obtaining the parameters of the importance sampling (IS) change of measure
via single- and multi-stage minimization of well-known estimators of cross-
entropy and the mean square of the IS estimator, as well as of new estimators
of such a mean square and inefficiency constant. We prove the convergence and
asymptotic properties of the minimization results in our methods. We show that
if a zero-variance IS parameter exists, then, under appropriate assumptions,
minimization results of the new estimators converge to such a parameter at a
faster rate than such results of the well-known estimators, and a positive
definite asymptotic covariance matrix of the minimization results of the
cross-entropy estimators is four times such a matrix for the well-known mean
square estimators. We introduce criteria for comparing the asymptotic
efficiency of stochastic optimization methods, applicable to the minimization
methods of estimators considered in this work. In our numerical experiments
for computing expectations of functionals of an Euler scheme, the minimization
of the new estimators led to the lowest inefficiency constants and variances
of the IS estimators, followed by the minimization of the well-known mean
square estimators, and the cross-entropy ones.Die Ineffizienz der Verwendung einer erwartungstreuen SchÀtzfunktion in einer
Monte-Carlo-Methode (MC-Methode) kann durch eine Ineffizienz-Konstante
quantifiziert werden, die gleich dem Produkt der Varianz der SchÀtzfunktion
und ihrem mittleren Rechenaufwand ist. In dieser Arbeit entwickeln wir
Verfahren, um gute Parameter fĂŒr einen MaĂwechsel beim adaptiven Importance
Sampling (IS) zu berechnen durch ein- und mehrstufigen Minimierung der
bekannten SchÀtzer der Kreuzentropie (engl: cross entropy) und des mittleren
Quadrats der IS-SchÀtzfunktion, sowie neuartiger SchÀtzer des mittleren
Quadrats und der Ineffizienz-Konstante. Wir beweisen dass die Ergebnisse
unseren Verfahren geeignete starke Konvergenz und asymptotischen Eigenschaften
genieĂen. Wir zeigen, dass, wenn ein Null-Varianz-IS-Parameter existiert, die
Minimierungsverfahren fĂŒr die neuen SchĂ€tzer unter geeigneten Annahmen
schneller gegen diesen Parameter konvergieren als fĂŒr die bekannten SchĂ€tzer
und die positiv definite asymptotische Kovarianzmatrix des minimierten
Kreuzentropie-SchÀtzer ist das Vierfache der Kovarianzmatrix im Falle des
bekannten SchĂ€tzers mittleres Quadrats. Wir stellen Kriterien fĂŒr den
Vergleich der asymptotischen Effizienz der stochastischen
Optimierungsverfahren, anwendbar auf die Minimierung Methoden der SchÀtzer die
in dieser Arbeit betrachtet sind. In unseren numerischen Experimenten zur
Berechnung von Erwartungswerten von Funktionalen eines Euler-diskretisierten
Diffusionsprozesses fĂŒhrte die Minimierung der neuen SchĂ€tzer zu den
niedrigsten Ineffizienz-Konstanten und Varianzen, gefolgt von den bekannten
SchÀtzfunktionen des mittleren Quadrats und der Kreuzentropie
Is Organic Chemistry Really Growing Exponentially?
In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases
Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans
As the programs for computer-aided retrosynthetic design come of age, they are no longer identifying just one or few synthetic routes but a multitude of chemically plausible syntheses, together forming large, directed graphs of solutions. An important problem then emerges: how to select from these graphs and present to the user manageable numbers of top-scoring pathways that are cost-effective, promote convergent vs. linear solutions, and are chemically diverse so that they do not repeat only minor variations in the same chemical theme. This paper describes a family of reaction network algorithms that address this problem by (i) using recursive formulae to assign realistic prices to individual pathways and (ii) applying penalties to chemically similar strategies so that they are not dominating the top-scoring routes. Synthetic examples are provided to illustrate how these algorithms can be implemented-on the timescales of similar to 1 s even for large graphs-to rapidly query the space of synthetic solutions under the scenarios of different reaction yields and/or costs associated with performing reaction operations on different scales. © The Royal Society of Chemistry 201
Prediction of major regio-, site-, and diastereoisomers in Diels-Alder reactions using machine-learning: The importance of physically meaningful descriptors
Machine learning can predict the major regio-, site-, and diastereoselective outcomes of Diels-Alder reactions better than standard quantum-mechanical methods and with accuracies exceeding 90% provided that (i) the diene/dienophile substrates are represented by âphysical-organicâ descriptors reflecting the electronic and steric characteristics of their substituents and (ii) the positions of such substituents relative to the reaction core are encoded (âvectorizedâ) in an informative way. (c) 2013 Wiley-VCH verlag Gmbh & Co. KGaA, Weinhei
Network search algorithms and scoring functions for advanced-level computerized synthesis planning
© 2022 Wiley Periodicals LLC.In 2020, a âhybridâ expert-AI computer program called Chematica (a.k.a. Synthia) was shown to autonomously plan multistep syntheses of complex natural products, which remain outside the reach of purely data-driven AI programs. The ability to plan at this level of chemical sophistication has been attributed mainly to the superior quality of Chematica's reactions rules. However, rules alone are not sufficient for advanced synthetic planning which also requires appropriately crafted algorithms with which to intelligently navigate the enormous networks of synthetic possibilities, score the synthetic positions encountered, and rank the pathways identified. Chematica's algorithms are distinct from prĂȘt-Ă -porter algorithmic solutions and are product of multiple rounds of improvements, against target structures of increasing complexity. Since descriptions of these improvements have been scattered among several of our prior publications, the aim of the current Review is to narrate the development process in a more comprehensive manner. This article is categorized under: Data Science > Computer Algorithms and Programming Data Science > Artificial Intelligence/Machine Learning Quantum Computing > Algorithms.11Nsciescopu
Synergy Between Expert and Machine-Learning Approaches Allows for Improved Retrosynthetic Planning
When computers plan multistep syntheses, they can rely either on expert knowledge or information machine-extracted from large reaction repositories. Both approaches suffer from imperfect functions evaluating reaction choices: expert functions are heuristics based on chemical intuition, whereas machine learning (ML) relies on neural networks (NNs) that can make meaningful predictions only about popular reaction types. This paper shows that expert and ML approaches can be synergistic-specifically, when NNs are trained on literature data matched onto high-quality, expert-coded reaction rules, they achieve higher synthetic accuracy than either of the methods alone and, importantly, can also handle rare/specialized reaction types