45 research outputs found

    Barking up the right tree: An approach to search over molecule synthesis DAGs

    Get PDF
    When designing new molecules with particular properties, it is not only important what to make but crucially how to make it. These instructions form a synthesis directed acyclic graph (DAG), describing how a large vocabulary of simple building blocks can be recursively combined through chemical reactions to create more complicated molecules of interest. In contrast, many current deep generative models for molecules ignore synthesizability. We therefore propose a deep generative model that better represents the real world process, by directly outputting molecule synthesis DAGs. We argue that this provides sensible inductive biases, ensuring that our model searches over the same chemical space that chemists would also have access to, as well as interpretability. We show that our approach is able to model chemical space well, producing a wide range of diverse molecules, and allows for unconstrained optimization of an inherently constrained problem: maximize certain chemical properties such that discovered molecules are synthesizable

    A generative model for electron paths

    Get PDF
    Chemical reactions can be described as the stepwise redistribution of electrons in molecules. As such, reactions are often depicted using “arrow-pushing” diagrams which show this movement as a sequence of arrows. We propose an electron path prediction model (ELECTRO) to learn these sequences directly from raw reaction data. Instead of predicting product molecules directly from reactant molecules in one shot, learning a model of electron movement has the benefits of (a) being easy for chemists to interpret, (b) incorporating constraints of chemistry, such as balanced atom counts before and after the reaction, and (c) naturally encoding the sparsity of chemical reactions, which usually involve changes in only a small number of atoms in the reactants. We design a method to extract approximate reaction paths from any dataset of atom-mapped reaction SMILES strings. Our model achieves excellent performance on an important subset of the USPTO reaction dataset, comparing favorably to the strongest baselines. Furthermore, we show that our model recovers a basic knowledge of chemistry without being explicitly trained to do so.EPSR

    A New Integer Linear Programming Formulation to the Inverse QSAR/QSPR for Acyclic Chemical Compounds Using Skeleton Trees

    Get PDF
    33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Kitakyushu, Japan, September 22-25, 2020.Computer-aided drug design is one of important application areas of intelligent systems. Recently a novel method has been proposed for inverse QSAR/QSPR using both artificial neural networks (ANN) and mixed integer linear programming (MILP), where inverse QSAR/QSPR is a major approach for drug design. This method consists of two phases: In the first phase, a feature function f is defined so that each chemical compound G is converted into a vector f(G) of several descriptors of G, and a prediction function ψ is constructed with an ANN so that ψ(f(G)) takes a value nearly equal to a given chemical property π for many chemical compounds G in a data set. In the second phase, given a target value y∗ of the chemical property π , a chemical structure G∗ is inferred in the following way. An MILP M is formulated so that M admits a feasible solution (x∗, y∗) if and only if there exist vectors x∗, y∗ and a chemical compound G∗ such that ψ(x∗)=y∗ and f(G∗)=x∗. The method has been implemented for inferring acyclic chemical compounds. In this paper, we propose a new MILP for inferring acyclic chemical compounds by introducing a novel concept, skeleton tree, and conducted computational experiments. The results suggest that the proposed method outperforms the existing method when the diameter of graphs is up to around 6 to 8. For an instance for inferring acyclic chemical compounds with 38 non-hydrogen atoms from C, O and S and diameter 6, our method was 5×104 times faster

    Automatic mapping of atoms across both simple and complex chemical reactions

    Get PDF
    Mapping atoms across chemical reactions is important for substructure searches, automatic extraction of reaction rules, identification of metabolic pathways, and more. Unfortunately, the existing mapping algorithms can deal adequately only with relatively simple reactions but not those in which expert chemists would benefit from computer's help. Here we report how a combination of algorithmics and expert chemical knowledge significantly improves the performance of atom mapping, allowing the machine to deal with even the most mechanistically complex chemical and biochemical transformations. The key feature of our approach is the use of few but judiciously chosen reaction templates that are used to generate plausible "intermediate" atom assignments which then guide a graph-theoretical algorithm towards the chemically correct isomorphic mappings. The algorithm performs significantly better than the available state-of-the-art reaction mappers, suggesting its uses in database curation, mechanism assignments, and - above all - machine extraction of reaction rules underlying modern synthesis-planning programs

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Pathways to cellular supremacy in biocomputing

    Get PDF
    Synthetic biology uses living cells as the substrate for performing human-defined computations. Many current implementations of cellular computing are based on the “genetic circuit” metaphor, an approximation of the operation of silicon-based computers. Although this conceptual mapping has been relatively successful, we argue that it fundamentally limits the types of computation that may be engineered inside the cell, and fails to exploit the rich and diverse functionality available in natural living systems. We propose the notion of “cellular supremacy” to focus attention on domains in which biocomputing might offer superior performance over traditional computers. We consider potential pathways toward cellular supremacy, and suggest application areas in which it may be found.A.G.-M. was supported by the SynBio3D project of the UK Engineering and Physical Sciences Research Council (EP/R019002/1) and the European CSA on biological standardization BIOROBOOST (EU grant number 820699). T.E.G. was supported by a Royal Society University Research Fellowship (grant UF160357) and BrisSynBio, a BBSRC/ EPSRC Synthetic Biology Research Centre (grant BB/L01386X/1). P.Z. was supported by the EPSRC Portabolomics project (grant EP/N031962/1). P.C. was supported by SynBioChem, a BBSRC/EPSRC Centre for Synthetic Biology of Fine and Specialty Chemicals (grant BB/M017702/1) and the ShikiFactory100 project of the European Union’s Horizon 2020 research and innovation programme under grant agreement 814408

    Streamlining bioactive molecular discovery through integration and automation

    Get PDF
    The discovery of bioactive small molecules is generally driven via iterative design–make–purify–test cycles. Automation is routinely harnessed at individual stages of these cycles to increase the productivity of drug discovery. Here, we describe recent progress to automate and integrate two or more adjacent stages within discovery workflows. Examples of such technologies include microfluidics, liquid-handling robotics and affinity-selection mass spectrometry. The value of integrated technologies is illustrated in the context of specific case studies in which modulators of targets, such as protein kinases, nuclear hormone receptors and protein–protein interactions, were discovered. We note that to maximize impact on the productivity of discovery, each of the integrated stages would need to have both high and matched throughput. We also consider the longer-term goal of realizing the fully autonomous discovery of bioactive small molecules through the integration and automation of all stages of discovery

    A model to search for synthesizable molecules

    Get PDF
    Deep generative models are able to suggest new organic molecules by generating strings, trees, and graphs representing their structure. While such models allow one to generate molecules with desirable properties, they give no guarantees that the molecules can actually be synthesized in practice. We propose a new molecule generation model, mirroring a more realistic real-world process, where (a) reactants are selected, and (b) combined to form more complex molecules. More specifically, our generative model proposes a bag of initial reactants (selected from a pool of commercially-available molecules) and uses a reaction model to predict how they react together to generate new molecules. We first show that the model can generate diverse, valid and unique molecules due to the useful inductive biases of modeling reactions. Furthermore, our model allows chemists to interrogate not only the properties of the generated molecules but also the feasibility of the synthesis routes. We conclude by using our model to solve retrosynthesis problems, predicting a set of reactants that can produce a target product

    Generating molecules via chemical reactions

    Get PDF
    © Deep Generative Models for Highly Structured Data, DGS@ICLR 2019 Workshop.All right reserved. Over the last few years exciting work in deep generative models has produced models able to suggest new organic molecules by generating strings, trees, and graphs representing their structure. While such models are able to generate molecules with desirable properties, their utility in practice is limited due to the difficulty in knowing how to synthesize these molecules. We therefore propose a new molecule generation model, mirroring a more realistic real-world process, where reactants are selected and combined to form more complex molecules. More specifically, our generative model proposes a bag of initial reactants (selected from a pool of commercially-available molecules) and uses a reaction model to predict how they react together to generate new molecules. Modeling the entire process of constructing a molecule during generation offers a number of advantages. First, we show that such a model has the ability to generate a wide, diverse set of valid and unique molecules due to the useful inductive biases of modeling reactions. Second, modeling synthesis routes rather than final molecules offers practical advantages to chemists who are not only interested in new molecules but also suggestions on stable and safe synthetic routes. Third, we demonstrate the capabilities of our model to also solve one-step retrosynthesis problems, predicting a set of reactants that can produce a target product
    corecore