12 research outputs found

    Computational planning of the synthesis of complex natural products

    Get PDF
    Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years(1-7). However, the field has progressed greatly since the development of early programs such as LHASA(1,7), for which reaction choices at each step were made by human operators. Multiple software platforms(6,8-14) are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary(15,16) and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships(17,18), allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts

    Das adaptive Importance Sampling durch Minimierung der SchÀtzer der Kreuzentropie, des mittleren Quadrats, und der Ineffizienz-Konstante

    No full text
    The inefficiency of using an unbiased estimator in a Monte Carlo procedure can be quantified using an inefficiency constant, equal to the product of the variance of the estimator and its mean computational cost. We develop methods for obtaining the parameters of the importance sampling (IS) change of measure via single- and multi-stage minimization of well-known estimators of cross- entropy and the mean square of the IS estimator, as well as of new estimators of such a mean square and inefficiency constant. We prove the convergence and asymptotic properties of the minimization results in our methods. We show that if a zero-variance IS parameter exists, then, under appropriate assumptions, minimization results of the new estimators converge to such a parameter at a faster rate than such results of the well-known estimators, and a positive definite asymptotic covariance matrix of the minimization results of the cross-entropy estimators is four times such a matrix for the well-known mean square estimators. We introduce criteria for comparing the asymptotic efficiency of stochastic optimization methods, applicable to the minimization methods of estimators considered in this work. In our numerical experiments for computing expectations of functionals of an Euler scheme, the minimization of the new estimators led to the lowest inefficiency constants and variances of the IS estimators, followed by the minimization of the well-known mean square estimators, and the cross-entropy ones.Die Ineffizienz der Verwendung einer erwartungstreuen SchĂ€tzfunktion in einer Monte-Carlo-Methode (MC-Methode) kann durch eine Ineffizienz-Konstante quantifiziert werden, die gleich dem Produkt der Varianz der SchĂ€tzfunktion und ihrem mittleren Rechenaufwand ist. In dieser Arbeit entwickeln wir Verfahren, um gute Parameter fĂŒr einen Maßwechsel beim adaptiven Importance Sampling (IS) zu berechnen durch ein- und mehrstufigen Minimierung der bekannten SchĂ€tzer der Kreuzentropie (engl: cross entropy) und des mittleren Quadrats der IS-SchĂ€tzfunktion, sowie neuartiger SchĂ€tzer des mittleren Quadrats und der Ineffizienz-Konstante. Wir beweisen dass die Ergebnisse unseren Verfahren geeignete starke Konvergenz und asymptotischen Eigenschaften genießen. Wir zeigen, dass, wenn ein Null-Varianz-IS-Parameter existiert, die Minimierungsverfahren fĂŒr die neuen SchĂ€tzer unter geeigneten Annahmen schneller gegen diesen Parameter konvergieren als fĂŒr die bekannten SchĂ€tzer und die positiv definite asymptotische Kovarianzmatrix des minimierten Kreuzentropie-SchĂ€tzer ist das Vierfache der Kovarianzmatrix im Falle des bekannten SchĂ€tzers mittleres Quadrats. Wir stellen Kriterien fĂŒr den Vergleich der asymptotischen Effizienz der stochastischen Optimierungsverfahren, anwendbar auf die Minimierung Methoden der SchĂ€tzer die in dieser Arbeit betrachtet sind. In unseren numerischen Experimenten zur Berechnung von Erwartungswerten von Funktionalen eines Euler-diskretisierten Diffusionsprozesses fĂŒhrte die Minimierung der neuen SchĂ€tzer zu den niedrigsten Ineffizienz-Konstanten und Varianzen, gefolgt von den bekannten SchĂ€tzfunktionen des mittleren Quadrats und der Kreuzentropie

    Is Organic Chemistry Really Growing Exponentially?

    No full text
    In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases

    Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans

    No full text
    As the programs for computer-aided retrosynthetic design come of age, they are no longer identifying just one or few synthetic routes but a multitude of chemically plausible syntheses, together forming large, directed graphs of solutions. An important problem then emerges: how to select from these graphs and present to the user manageable numbers of top-scoring pathways that are cost-effective, promote convergent vs. linear solutions, and are chemically diverse so that they do not repeat only minor variations in the same chemical theme. This paper describes a family of reaction network algorithms that address this problem by (i) using recursive formulae to assign realistic prices to individual pathways and (ii) applying penalties to chemically similar strategies so that they are not dominating the top-scoring routes. Synthetic examples are provided to illustrate how these algorithms can be implemented-on the timescales of similar to 1 s even for large graphs-to rapidly query the space of synthetic solutions under the scenarios of different reaction yields and/or costs associated with performing reaction operations on different scales. © The Royal Society of Chemistry 201

    Prediction of major regio-, site-, and diastereoisomers in Diels-Alder reactions using machine-learning: The importance of physically meaningful descriptors

    No full text
    Machine learning can predict the major regio-, site-, and diastereoselective outcomes of Diels-Alder reactions better than standard quantum-mechanical methods and with accuracies exceeding 90% provided that (i) the diene/dienophile substrates are represented by “physical-organic” descriptors reflecting the electronic and steric characteristics of their substituents and (ii) the positions of such substituents relative to the reaction core are encoded (“vectorized“) in an informative way. (c) 2013 Wiley-VCH verlag Gmbh & Co. KGaA, Weinhei

    Network search algorithms and scoring functions for advanced-level computerized synthesis planning

    No full text
    © 2022 Wiley Periodicals LLC.In 2020, a “hybrid” expert-AI computer program called Chematica (a.k.a. Synthia) was shown to autonomously plan multistep syntheses of complex natural products, which remain outside the reach of purely data-driven AI programs. The ability to plan at this level of chemical sophistication has been attributed mainly to the superior quality of Chematica's reactions rules. However, rules alone are not sufficient for advanced synthetic planning which also requires appropriately crafted algorithms with which to intelligently navigate the enormous networks of synthetic possibilities, score the synthetic positions encountered, and rank the pathways identified. Chematica's algorithms are distinct from prĂȘt-Ă -porter algorithmic solutions and are product of multiple rounds of improvements, against target structures of increasing complexity. Since descriptions of these improvements have been scattered among several of our prior publications, the aim of the current Review is to narrate the development process in a more comprehensive manner. This article is categorized under: Data Science > Computer Algorithms and Programming Data Science > Artificial Intelligence/Machine Learning Quantum Computing > Algorithms.11Nsciescopu

    Synergy Between Expert and Machine-Learning Approaches Allows for Improved Retrosynthetic Planning

    No full text
    When computers plan multistep syntheses, they can rely either on expert knowledge or information machine-extracted from large reaction repositories. Both approaches suffer from imperfect functions evaluating reaction choices: expert functions are heuristics based on chemical intuition, whereas machine learning (ML) relies on neural networks (NNs) that can make meaningful predictions only about popular reaction types. This paper shows that expert and ML approaches can be synergistic-specifically, when NNs are trained on literature data matched onto high-quality, expert-coded reaction rules, they achieve higher synthetic accuracy than either of the methods alone and, importantly, can also handle rare/specialized reaction types
    corecore