Search CORE

12 research outputs found

Computational planning of the synthesis of complex natural products

Author: Badowski Tomasz
Bayly Alison A.
Beker Wiktor
Dittwald Piotr
Gajewska Ewa P.
Golebiowska Patrycja
Grzybowski Bartosz A.
Klucznik Tomasz
Mikulak-Klucznik Barbara
Mlynarski Jacek
Molga Karol
Mrksich Milan
Popik Oskar
Scheidt Karl A.
Staszewska-Krajewska Olga
Szymkuc Sara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2020
Field of study

Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years(1-7). However, the field has progressed greatly since the development of early programs such as LHASA(1,7), for which reaction choices at each step were made by human operators. Multiple software platforms(6,8-14) are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary(15,16) and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships(17,18), allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts

IBS Publications Repository

ScholarWorks@UNIST

Das adaptive Importance Sampling durch Minimierung der Schätzer der Kreuzentropie, des mittleren Quadrats, und der Ineffizienz-Konstante

Author: Badowski Tomasz
Publication venue
Publication date: 01/01/2016
Field of study

The inefficiency of using an unbiased estimator in a Monte Carlo procedure can be quantified using an inefficiency constant, equal to the product of the variance of the estimator and its mean computational cost. We develop methods for obtaining the parameters of the importance sampling (IS) change of measure via single- and multi-stage minimization of well-known estimators of cross- entropy and the mean square of the IS estimator, as well as of new estimators of such a mean square and inefficiency constant. We prove the convergence and asymptotic properties of the minimization results in our methods. We show that if a zero-variance IS parameter exists, then, under appropriate assumptions, minimization results of the new estimators converge to such a parameter at a faster rate than such results of the well-known estimators, and a positive definite asymptotic covariance matrix of the minimization results of the cross-entropy estimators is four times such a matrix for the well-known mean square estimators. We introduce criteria for comparing the asymptotic efficiency of stochastic optimization methods, applicable to the minimization methods of estimators considered in this work. In our numerical experiments for computing expectations of functionals of an Euler scheme, the minimization of the new estimators led to the lowest inefficiency constants and variances of the IS estimators, followed by the minimization of the well-known mean square estimators, and the cross-entropy ones.Die Ineffizienz der Verwendung einer erwartungstreuen Schätzfunktion in einer Monte-Carlo-Methode (MC-Methode) kann durch eine Ineffizienz-Konstante quantifiziert werden, die gleich dem Produkt der Varianz der Schätzfunktion und ihrem mittleren Rechenaufwand ist. In dieser Arbeit entwickeln wir Verfahren, um gute Parameter für einen Maßwechsel beim adaptiven Importance Sampling (IS) zu berechnen durch ein- und mehrstufigen Minimierung der bekannten Schätzer der Kreuzentropie (engl: cross entropy) und des mittleren Quadrats der IS-Schätzfunktion, sowie neuartiger Schätzer des mittleren Quadrats und der Ineffizienz-Konstante. Wir beweisen dass die Ergebnisse unseren Verfahren geeignete starke Konvergenz und asymptotischen Eigenschaften genießen. Wir zeigen, dass, wenn ein Null-Varianz-IS-Parameter existiert, die Minimierungsverfahren für die neuen Schätzer unter geeigneten Annahmen schneller gegen diesen Parameter konvergieren als für die bekannten Schätzer und die positiv definite asymptotische Kovarianzmatrix des minimierten Kreuzentropie-Schätzer ist das Vierfache der Kovarianzmatrix im Falle des bekannten Schätzers mittleres Quadrats. Wir stellen Kriterien für den Vergleich der asymptotischen Effizienz der stochastischen Optimierungsverfahren, anwendbar auf die Minimierung Methoden der Schätzer die in dieser Arbeit betrachtet sind. In unseren numerischen Experimenten zur Berechnung von Erwartungswerten von Funktionalen eines Euler-diskretisierten Diffusionsprozesses führte die Minimierung der neuen Schätzer zu den niedrigsten Ineffizienz-Konstanten und Varianzen, gefolgt von den bekannten Schätzfunktionen des mittleren Quadrats und der Kreuzentropie

Institutional Repository of the Freie Universität Berlin

Is Organic Chemistry Really Growing Exponentially?

Author: Badowski Tomasz
Grzybowski Bartosz A.
Szymkuc Sara
Publication venue: 'Wiley'
Publication date: 01/12/2021
Field of study

In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases

ScholarWorks@UNIST

Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans

Author: Bartosz A. Grzybowski
Karol Molga
Tomasz Badowski
Publication venue: ROYAL SOC CHEMISTRY
Publication date: 01/05/2019
Field of study

As the programs for computer-aided retrosynthetic design come of age, they are no longer identifying just one or few synthetic routes but a multitude of chemically plausible syntheses, together forming large, directed graphs of solutions. An important problem then emerges: how to select from these graphs and present to the user manageable numbers of top-scoring pathways that are cost-effective, promote convergent vs. linear solutions, and are chemically diverse so that they do not repeat only minor variations in the same chemical theme. This paper describes a family of reaction network algorithms that address this problem by (i) using recursive formulae to assign realistic prices to individual pathways and (ii) applying penalties to chemically similar strategies so that they are not dominating the top-scoring routes. Synthetic examples are provided to illustrate how these algorithms can be implemented-on the timescales of similar to 1 s even for large graphs-to rapidly query the space of synthetic solutions under the scenarios of different reaction yields and/or costs associated with performing reaction operations on different scales. © The Royal Society of Chemistry 201

IBS Publications Repository

ScholarWorks@UNIST

Prediction of major regio-, site-, and diastereoisomers in Diels-Alder reactions using machine-learning: The importance of physically meaningful descriptors

Author: Bartosz A. Grzybowski
Ewa Gajewska
Tomasz Badowski
Wiktor Beker
Publication venue: WILEY-V C H VERLAG GMBH
Publication date: 01/03/2019
Field of study

Machine learning can predict the major regio-, site-, and diastereoselective outcomes of Diels-Alder reactions better than standard quantum-mechanical methods and with accuracies exceeding 90% provided that (i) the diene/dienophile substrates are represented by “physical-organic” descriptors reflecting the electronic and steric characteristics of their substituents and (ii) the positions of such substituents relative to the reaction core are encoded (“vectorized“) in an informative way. (c) 2013 Wiley-VCH verlag Gmbh & Co. KGaA, Weinhei

IBS Publications Repository

Network search algorithms and scoring functions for advanced-level computerized synthesis planning

Author: Badowski Tomasz
Bartosz A. Grzybowski
Molga Karol
Szymkuć Sara
Publication venue: 'Wiley'
Publication date: 01/06/2022
Field of study

© 2022 Wiley Periodicals LLC.In 2020, a “hybrid” expert-AI computer program called Chematica (a.k.a. Synthia) was shown to autonomously plan multistep syntheses of complex natural products, which remain outside the reach of purely data-driven AI programs. The ability to plan at this level of chemical sophistication has been attributed mainly to the superior quality of Chematica's reactions rules. However, rules alone are not sufficient for advanced synthetic planning which also requires appropriately crafted algorithms with which to intelligently navigate the enormous networks of synthetic possibilities, score the synthetic positions encountered, and rank the pathways identified. Chematica's algorithms are distinct from prêt-à-porter algorithmic solutions and are product of multiple rounds of improvements, against target structures of increasing complexity. Since descriptions of these improvements have been scattered among several of our prior publications, the aim of the current Review is to narrate the development process in a more comprehensive manner. This article is categorized under: Data Science > Computer Algorithms and Programming Data Science > Artificial Intelligence/Machine Learning Quantum Computing > Algorithms.11Nsciescopu

IBS Publications Repository

ScholarWorks@UNIST

Synergy Between Expert and Machine-Learning Approaches Allows for Improved Retrosynthetic Planning

Author: Badowski Tomasz
Gajewska Ewa P.
Grzybowski Bartosz A.
Molga Karol
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

When computers plan multistep syntheses, they can rely either on expert knowledge or information machine-extracted from large reaction repositories. Both approaches suffer from imperfect functions evaluating reaction choices: expert functions are heuristics based on chemical intuition, whereas machine learning (ML) relies on neural networks (NNs) that can make meaningful predictions only about popular reaction types. This paper shows that expert and ML approaches can be synergistic-specifically, when NNs are trained on literature data matched onto high-quality, expert-coded reaction rules, they achieve higher synthetic accuracy than either of the methods alone and, importantly, can also handle rare/specialized reaction types

IBS Publications Repository

Crossref

ScholarWorks@UNIST