8 research outputs found

    Comprehensive analysis of applicability domains of QSPR models for chemical reactions

    No full text
    Ā© 2020 by the authors. Licensee MDPI, Basel, Switzerland. Nowadays, the problem of the modelā€™s applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR modelsā€™ performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several ā€œbestā€ AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem

    Cross-validation strategies in QSPR modelling of chemical reactions

    No full text
    In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an ā€˜optimisticallyā€™ biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, ā€˜transformation-outā€™ CV, and ā€˜solvent-outā€™ CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools)

    Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow

    No full text
    The synthesis of the desired chemical compound is the main task of synthetic organic chemistry. The predictions of reaction conditions and some important quantitative characteristics of chemical reactions as yield and reaction rate can substantially help in the development of optimal synthetic routes and assessment of synthesis cost. Theoretical assessment of these parameters can be performed with the help of modern machine-learning approaches, which use available experimental data to develop predictive models called quantitative or qualitative structureā€“reactivity relationship (QSRR) modelling. In the article, we review the state-of-the-art in the QSRR area and give our opinion on emerging trends in this field

    Reaction Data Curation I: Chemical Structures and Transformations Standardization

    No full text
    The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning)

    Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies

    No full text
    In this paper, we compare the most popular Atom-to-Atom Mapping (AAM) tools: ChemAxon,[1] Indigo,[2] RDTool,[3] NameRXN (NextMove),[4] and RXNMapper[5] which implement different AAM algorithms. An open-source RDTool program was optimized, and its modified version (ā€œnew RDToolā€) was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the ā€œAAM fixerā€ algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden dataset containing 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique

    Modern Trends of Organic Chemistry in Russian Universities

    No full text
    corecore