705 research outputs found

    Path finding methods accounting for stoichiometry in metabolic networks

    Get PDF
    Graph-based methods have been widely used for the analysis of biological networks. Their application to metabolic networks has been much discussed, in particular noting that an important weakness in such methods is that reaction stoichiometry is neglected. In this study, we show that reaction stoichiometry can be incorporated into path-finding approaches via mixed-integer linear programming. This major advance at the modeling level results in improved prediction of topological and functional properties in metabolic networks

    Learning the Language of Chemical Reactions – Atom by Atom. Linguistics-Inspired Machine Learning Methods for Chemical Reaction Tasks

    Get PDF
    Over the last hundred years, not much has changed how organic chemistry is conducted. In most laboratories, the current state is still trial-and-error experiments guided by human expertise acquired over decades. What if, given all the knowledge published, we could develop an artificial intelligence-based assistant to accelerate the discovery of novel molecules? Although many approaches were recently developed to generate novel molecules in silico, only a few studies complete the full design-make-test cycle, including the synthesis and the experimental assessment. One reason is that the synthesis part can be tedious, time-consuming, and requires years of experience to perform successfully. Hence, the synthesis is one of the critical limiting factors in molecular discovery. In this thesis, I take advantage of similarities between human language and organic chemistry to apply linguistic methods to chemical reactions, and develop artificial intelligence-based tools for accelerating chemical synthesis. First, I investigate reaction prediction models focusing on small data sets of challenging stereo- and regioselective carbohydrate reactions. Second, I develop a multi-step synthesis planning tool predicting reactants and suitable reagents (e.g. catalysts and solvents). Both forward prediction and retrosynthesis approaches use black-box models. Hence, I then study methods to provide more information about the models’ predictions. I develop a reaction classification model that labels chemical reaction and facilitates the communication of reaction concepts. As a side product of the classification models, I obtain reaction fingerprints that enable efficient similarity searches in chemical reaction space. Moreover, I study approaches for predicting reaction yields. Lastly, after I approached all chemical reaction tasks with atom-mapping independent models, I demonstrate the generation of accurate atom-mapping from the patterns my models have learned while being trained self-supervised on chemical reactions. My PhD thesis’s leitmotif is the use of the attention-based Transformer architecture to molecules and reactions represented with a text notation. It is like atoms are my letters, molecules my words, and reactions my sentences. With this analogy, I teach my neural network models the language of chemical reactions - atom by atom. While exploring the link between organic chemistry and language, I make an essential step towards the automation of chemical synthesis, which could significantly reduce the costs and time required to discover and create new molecules and materials

    Molecular Similarity and Xenobiotic Metabolism

    Get PDF
    MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie

    Kinetic model construction using chemoinformatics

    Get PDF
    Kinetic models of chemical processes not only provide an alternative to costly experiments; they also have the potential to accelerate the pace of innovation in developing new chemical processes or in improving existing ones. Kinetic models are most powerful when they reflect the underlying chemistry by incorporating elementary pathways between individual molecules. The downside of this high level of detail is that the complexity and size of the models also steadily increase, such that the models eventually become too difficult to be manually constructed. Instead, computers are programmed to automate the construction of these models, and make use of graph theory to translate chemical entities such as molecules and reactions into computer-understandable representations. This work studies the use of automated methods to construct kinetic models. More particularly, the need to account for the three-dimensional arrangement of atoms in molecules and reactions of kinetic models is investigated and illustrated by two case studies. First of all, the thermal rearrangement of two monoterpenoids, cis- and trans-2-pinanol, is studied. A kinetic model that accounts for the differences in reactivity and selectivity of both pinanol diastereomers is proposed. Secondly, a kinetic model for the pyrolysis of the fuel “JP-10” is constructed and highlights the use of state-of-the-art techniques for the automated estimation of thermochemistry of polycyclic molecules. A new code is developed for the automated construction of kinetic models and takes advantage of the advances made in the field of chemo-informatics to tackle fundamental issues of previous approaches. Novel algorithms are developed for three important aspects of automated construction of kinetic models: the estimation of symmetry of molecules and reactions, the incorporation of stereochemistry in kinetic models, and the estimation of thermochemical and kinetic data using scalable structure-property methods. Finally, the application of the code is illustrated by the automated construction of a kinetic model for alkylsulfide pyrolysis

    Analysis of Generative Chemistries

    Get PDF
    For the modelling of chemistry we use undirected, labelled graphs as explicit models of molecules and graph transformation rules for modelling generalised chemical reactions. This is used to define artificial chemistries on the level of individual bonds and atoms, where formal graph grammars implicitly represent large spaces of chemical compounds. We use a graph rewriting formalism, rooted in category theory, called the Double Pushout approach, which directly expresses the transition state of chemical reactions. Using concurrency theory for transformation rules, we define algorithms for the composition of rewrite rules in a chemically intuitive manner that enable automatic abstraction of the level of detail in chemical pathways. Based on this rule composition we define an algorithmic framework for generation of vast reaction networks for specific spaces of a given chemistry, while still maintaining the level of detail of the model down to the atomic level. The framework also allows for computation with graphs and graph grammars, which is utilised to model non-trivial chemical systems. The graph generation relies on graph isomorphism testing, and we review the general individualisation-refinement paradigm used in the state-of-the-art algorithms for graph canonicalisation, isomorphism testing, and automorphism discovery. We present a model for chemical pathways based on a generalisation of network flows from ordinary directed graphs to directed hypergraphs. The model allows for reasoning about the flow of individual molecules in general pathways, and the introduction of chemically motivated routing constraints. It further provides the foundation for defining specialised pathway motifs, which is illustrated by defining necessary topological constraints for both catalytic and autocatalytic pathways. We also prove that central types of pathway questions are NP-complete, even for restricted classes of reaction networks. The complete pathway model, including constraints for catalytic and autocatalytic pathways, is implemented using integer linear programming. This implementation is used in a tree search method to enumerate both optimal and near-optimal pathway solutions. The formal methods are applied to multiple chemical systems: the enzyme catalysed beta-lactamase reaction, variations of the glycolysis pathway, and the formose process. In each of these systems we use rule composition to abstract pathways and calculate traces for isotope labelled carbon atoms. The pathway model is used to automatically enumerate alternative non-oxidative glycolysis pathways, and enumerate thousands of candidates for autocatalytic pathways in the formose process

    A critical evaluation of automatic atom mapping algorithms and tools

    Get PDF
    Dissertação de mestardo em BioinformaticsThe identification of the atoms which change their position in chemical reactions is an important knowledge within the field of Metabolic Engineering (ME). This can lead to new advances at different levels from the reconstruction of metabolic networks to the classification of chemical reactions, through the identification of the atomic changes inside a reaction. The Atom Mapping approach was initially developed in the 1960’s, but recently it has suffered important advances, being used in diverse biological and biotechnological studies. The main methodologies used for the atom mapping process are the Maximum Common Substructure (MCS) and the Linear Optimization methods, which both require computational know-how and powerful resources to run the underlying tools. In this work, we assessed a number of previously implemented atom mapping frameworks, and built a framework able of managing the different data inputs and outputs, as well as the mapping process provided by each of these third-party tools. We also evaluated the admissibility of the calculated atom maps from different algorithms, assessing if with different approaches were capable of returning equivalent atom maps for the same chemical reaction.A identificação dos átomos que mudam a sua posição durante uma reacção química é um conhecimento importante no âmbito da investigação no campo da Engenharia Metabólica. Esta identificação é capaz de nos trazer vantagens a diversos níveis desde a reconstrução de redes metabólicas até à classificação de reacções químicas através da identificação das mudanças atómicas dentro de uma reacção. As técnicas de mapeamento de átomos foram inicialmente desenvolvidas nos anos 1960, mas têm sofrido importantes avanços recentemente, sendo usada em diversos trabalhos biológicos e biotecnológicos. As principais metodologias usadas no mapeamento de átomos usam as abordagens de Máxima Estrutura Comum ou a Optimização Linear, em ambos os casos requerendo conhecimentos computacionais bem como de importantes recursos para correr as ferramentas subjacentes. Neste trabalho, avaliamos diversas plataformas de mapeamento de átomos já implementadas, e construímos uma plataforma capaz de gerir as diferentes entradas e saídas de dados, bem como o processo de mapeamento providenciado por cada uma das ferramentas. Avaliamos, ainda, a admissibilidade dos mapas atómicos calculados e se diferentes algoritmos, com diferentes abordagens, são capazes de calcular mapas atómicos equivalentes para a mesma reacção química

    Computational Approaches for Screening Drugs for Bioactivation, Reactive Metabolite Formation, and Toxicity

    Get PDF
    Cytochrome P450 enzymes aid in the elimination of a preponderance of small molecule drugs, but can generate reactive metabolites that may adversely conjugate to protein and DNA, in a process known as bioactivation, and prompt adverse reaction, drug candidate attrition, or market withdrawal. Experimental assays are low-throughput and expensive to perform, so they are often reserved until later stages of the drug development pipeline when the drug candidate pools are already significantly narrowed. Reactive metabolites also elude in vivo detection, as they are transitory and generally do not circulate. In contrast, computational methods are high-throughput and cheap to screen millions of potentially toxic molecules during early stages of the drug development pipeline. This work computationally models sequences of metabolic transformations, i.e., pathways, between an input molecule and a corresponding, optional reactive metabolite(s). Additionally, an accurate graph neural network model was developed to assess importance of intermediate metabolites and extract connected subnetworks of relevance to bioactivation. Connecting multiple site of metabolism and structure inference models, we developed an integrated model of metabolism and reactivity to evaluate bioactivation risk driven by epoxidation, quinone formation, thiophene sulfur-oxidation, and nitroaromatic reduction. We applied this framework to an understudied substructure, the isoxazole ring, that is gaining traction in a class of drugs known as bromodomain inhibitors that may potentially drive quinone formation. Finally, we attend to toxicity associated with drug-drug interactions, particularly with NSAID usage reported in electronic health records
    corecore