7 research outputs found

    Computational planning of the synthesis of complex natural products

    Get PDF
    Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years(1-7). However, the field has progressed greatly since the development of early programs such as LHASA(1,7), for which reaction choices at each step were made by human operators. Multiple software platforms(6,8-14) are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary(15,16) and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships(17,18), allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts

    Is Organic Chemistry Really Growing Exponentially?

    No full text
    In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases

    Chemist Ex Machina: Advanced Synthesis Planning by Computers

    No full text
    Teaching computers to plan multistep syntheses of arbitrary target molecules-including natural products-has been one of the oldest challenges in chemistry, dating back to the 1960s. This Account recapitulates two decades of our group's work on the software platform called Chematica, which very recently achieved this long-sought objective and has been shown capable of planning synthetic routes to complex natural products, several of which were validated in the laboratory. For the machine to plan syntheses at an expert level, it must know the rules describing chemical reactions and use these rules to expand and search the networks of synthetic options. The rules must be of high quality: They must delineate accurately the scope of admissible substituents, capture all relevant stereochemical information, detect potential reactivity conflicts, and protection requirements. They should yield only those synthons that are chemically stable and energetically allowed (e.g., not too strained) and should be able to extrapolate beyond examples already published in the literature. In parallel, the network-search algorithms must be able to assign meaningful scores to the sets of synthons they encounter, make judicious choices which of the network's branches to expand, and when to withdraw from unpromising ones. They must be able to strategize over multiple steps to resolve intermittent reactivity conflicts, exchange functional groups, or overcome local maxima of molecular complexity. Meeting all these requirements makes the problem of computer-driven retrosynthesis very multifaceted, combining expert and AI approaches further supplemented by quantum-mechanical and molecular-mechanics calculations. Development of Chematica has been a very long and gradual process because all these components are needed. Any shortcuts-for example, reliance on only expert or only data-based approaches-yield chemically naive and often erroneous syntheses, especially for complex targets. On the bright side, once all the requisite algorithms are implemented-as they now are-they not only streamline conventional synthetic planning but also enable completely new modalities that would challenge any human chemist, for example, synthesis with multiple constraints imposed simultaneously or library-wide syntheses in which the machine constructs "global plans" leading to multiple targets and benefiting from the use of common intermediates. These types of analyses will have profound impact on the practice of chemical industry, designing more economical, more green, and less hazardous pathways

    Scaffold-Directed Face Selectivity Machine-Learned from Vectors of Non-covalent Interactions

    No full text
    This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms

    Algorithmic Discovery of Tactical Combinations for Advanced Organic Syntheses

    No full text
    Whereas most organic molecules can be synthesized from progressively simpler substrates, syntheses of complex organic targets often involve counterintuitive sequence of steps that first complexify the structure but, by doing so, open up possibilities for pronounced structural simplification in subsequent, downstream steps. Such complexifying/simplifying reaction sequences, called tactical combinations (TCs), can be quite powerful and elegant but also inherently hard to spot-indeed, only some 500 TCs have so far been cataloged, and even fewer are routinely used in synthetic practice. This paper describes computer-driven discovery of large numbers of viable TCs (over 46,000 combinations of reaction classes and similar to 4.85 million combinations of reaction variants), the vast majority of which have no prior literature precedent. Examples-including a concise wet lab synthesis of a small natural product-are provided to illustrate how the use of these newly discovered TCs can streamline the design of syntheses leading to important drugs and/or natural products

    Synthetic connectivity, emergence, and self-regeneration in the network of prebiotic chemistry

    No full text
    The challenge of prebiotic chemistry is to trace the syntheses of life's key building blocks from a handful of primordial substrates. Here we report a forward-synthesis algorithm that generates a full network of prebiotic chemical reactions accessible from these substrates under generally accepted conditions. This network contains both reported and previously unidentified routes to biotic targets, as well as plausible syntheses of abiotic molecules. It also exhibits three forms of nontrivial chemical emergence, as the molecules within the network can act as catalysts of downstream reaction types; form functional chemical systems, including self-regenerating cycles; and produce surfactants relevant to primitive forms of biological compartmentalization. To support these claims, computer-predicted, prebiotic syntheses of several biotic molecules as well as a multistep, self-regenerative cycle of iminodiacetic acid were validated by experiment
    corecore