104,756 research outputs found

    Controlling an organic synthesis robot with machine learning to search for new reactivity

    Get PDF
    The discovery of chemical reactions is an inherently unpredictable and time-consuming process1. An attractive alternative is to predict reactivity, although relevant approaches, such as computer-aided reaction design, are still in their infancy2. Reaction prediction based on high-level quantum chemical methods is complex3, even for simple molecules. Although machine learning is powerful for data analysis4,5, its applications in chemistry are still being developed6. Inspired by strategies based on chemists’ intuition7, we propose that a reaction system controlled by a machine learning algorithm may be able to explore the space of chemical reactions quickly, especially if trained by an expert8. Here we present an organic synthesis robot that can perform chemical reactions and analysis faster than they can be performed manually, as well as predict the reactivity of possible reagent combinations after conducting a small number of experiments, thus effectively navigating chemical reaction space. By using machine learning for decision making, enabled by binary encoding of the chemical inputs, the reactions can be assessed in real time using nuclear magnetic resonance and infrared spectroscopy. The machine learning system was able to predict the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset. This approach was also used to calculate the reactivity of published datasets. Further, by using real-time data from our robot, these predictions were followed up manually by a chemist, leading to the discovery of four reactions

    Machine learning activation energies of chemical reactions

    Get PDF
    Application of machine learning (ML) to the prediction of reaction activation barriers is a new and exciting field for these algorithms. The works covered here are specifically those in which ML is trained to predict the activation energies of homogeneous chemical reactions, where the activation energy is given by the energy difference between the reactants and transition state of a reaction. Particular attention is paid to works that have applied ML to directly predict reaction activation energies, the limitations that may be found in these studies, and where comparisons of different types of chemical features for ML models have been made. Also explored are models that have been able to obtain high predictive accuracies, but with reduced datasets, using the Gaussian process regression ML model. In these studies, the chemical reactions for which activation barriers are modeled include those involving small organic molecules, aromatic rings, and organometallic catalysts. Also provided are brief explanations of some of the most popular types of ML models used in chemistry, as a beginner's guide for those unfamiliar

    Shift-Robust Molecular Relational Learning with Causal Substructure

    Full text link
    Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL.Comment: KDD 202
    • 

    corecore