7,846 research outputs found
Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks
Organic reactions are usually assigned to classes containing reactions with
similar reagents and mechanisms. Reaction classes facilitate the communication
of complex concepts and efficient navigation through chemical reaction space.
However, the classification process is a tedious task. It requires the
identification of the corresponding reaction class template via annotation of
the number of molecules in the reactions, the reaction center, and the
distinction between reactants and reagents. This work shows that
transformer-based models can infer reaction classes from non-annotated, simple
text-based representations of chemical reactions. Our best model reaches a
classification accuracy of 98.2%. We also show that the learned representations
can be used as reaction fingerprints that capture fine-grained differences
between reaction classes better than traditional reaction fingerprints. The
insights into chemical reaction space enabled by our learned fingerprints are
illustrated by an interactive reaction atlas providing visual clustering and
similarity searching.Comment: https://rxn4chemistry.github.io/rxnfp
Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model
Chemical reactions are the fundamental building blocks of drug design and
organic chemistry research. In recent years, there has been a growing need for
a large-scale deep-learning framework that can efficiently capture the basic
rules of chemical reactions. In this paper, we have proposed a unified
framework that addresses both the reaction representation learning and molecule
generation tasks, which allows for a more holistic approach. Inspired by the
organic chemistry mechanism, we develop a novel pretraining framework that
enables us to incorporate inductive biases into the model. Our framework
achieves state-of-the-art results on challenging downstream tasks. By
possessing chemical knowledge, our generative framework overcome the
limitations of current molecule generation models that rely on a small number
of reaction templates. In the extensive experiments, our model generates
synthesizable drug-like structures of high quality. Overall, our work presents
a significant step toward a large-scale deep-learning framework for a variety
of reaction-based applications
Molecular Similarity and Xenobiotic Metabolism
MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie
- …