599 research outputs found

    Putting Chemical Knowledge to Work in Machine Learning for Reactivity

    Get PDF
    Machine learning has been used to study chemical reactivity for a long time in fields such as physical organic chemistry, chemometrics and cheminformatics. Recent advances in computer science have resulted in deep neural networks that can learn directly from the molecular structure. Neural networks are a good choice when large amounts of data are available. However, many datasets in chemistry are small, and models utilizing chemical knowledge are required for good performance. Adding chemical knowledge can be achieved either by adding more information about the molecules or by adjusting the model architecture itself. The current method of choice for adding more information is descriptors based on computed quantum-chemical properties. Exciting new research directions show that it is possible to augment deep learning with such descriptors for better performance in the low-data regime. To modify the models, differentiable programming enables seamless merging of neural networks with mathematical models from chemistry and physics. The resulting methods are also more data-efficient and make better predictions for molecules that are different from the initial dataset on which they were trained. Application of these chemistry-informed machine learning methods promise to accelerate research in fields such as drug design, materials design, catalysis and reactivity

    Perturbation-Theory and Machine Learning (PTML). Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies

    Get PDF
    Machine Learning (ML) algorithms are gaining importance in the processing of chemical information and modelling of chemical reactivity problems. In this work, we have developed a PTML model combining Perturbation-Theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations in experimental conditions and/or structural variables of all the molecules involved in a query reaction compared to a reaction of reference. Thus, a dataset of >100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0 – 98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear vs. non-linear PTML models based on Artificial Neural Networks (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1 - 0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of non-reported Parham reactions to illustrate the practical use of the PTML model. A 500000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reportedMinisterio de Economía y Competitividad (CTQ2016-74881-P) / Ministerio de Economía y Competitividad (CTQ2013-41229-P) / Gobierno Vasco (IT1045-16

    On the Predictive Power of Chemical Concepts

    Get PDF
    Many chemical concepts can be well defined in the context of quantum chemical theories. Examples are the electronegativity scale of Mulliken and Jaffe and the hard and soft acids and bases concept of Pearson. The sound theoretical basis allows for a systematic definition of such concepts. However, while they are often used to describe and compare chemical processes in terms of reactivity, their predictive power remains unclear. In this work, we elaborate on the predictive potential of chemical reactivity concepts, which can be crucial for autonomous reaction exploration protocols to guide them by first-principles heuristics that expoit these concepts.Comment: 23 pages, 1 figure, 1 tabl

    Review of Data Sources, QSARs and Integrated Testing Strategies for Skin Sensitisation

    Get PDF
    This review collects information on sources of skin sensitisation data and computational tools for the estimation of skin sensitisation potential, such as expert systems and (quantitative) structure-activity relationship (QSAR) models. The review also captures current thinking of what constitutes an integrated testing strategy (ITS) for this endpoint. The emphasis of the review is on the usefulness of the models for the regulatory assessment of chemicals, particularly for the purposes of the new European legislation for the Registration, Evaluation, Authorisation and Restriction of CHemicals (REACH), which entered into force on 1 June 2007. Since there are no specific databases for skin sensitisation currently available, a description of experimental data found in various literature sources is provided. General (global) models, models for specific chemical classes and mechanisms of action and expert systems are summarised. This review was prepared as a contribution to the EU funded Integrated Project, OSIRIS.JRC.I.3-Consumer products safety and qualit

    Quantitative structure-property relationships for predicting chlorine demand and disinfection byproducts formation in drinking water

    Get PDF
    Models are important tools for designing or redesigning water treatment processes and technologies to minimize disinfection byproducts (DBPs) formation without compromising disinfection efficiency. Empirical models, which are the most common, are based on bulk water quality parameters that vary with time and space. These parameters may not always have linear relationships with chlorine demand and DBPs formation which make structure-based models more attractive to study. In this dissertation, Quantitative Structure-Property Relationship (QSPR) models which make use of structural properties of individual molecules were developed using experimental data obtained from the literature. The amounts are reported in moles of chlorine (HOCl) consumed or DBP formed per mole of a compound (Cp). The QSPRs were derived by multiple linear regression of chlorine demand or DBPs on a set of significant constitutional descriptors. The QSPRs were also tested for predictive power using cross validation and external validation for which the criteria were: Rc2 \u3e 0.6, q2 \u3e 0.5, 0.85 ≤ k ≤ 1.15 and Rt = (Ri2-Ro2)/Ri2 \u3c 0.1. The eight descriptor QSPR for HOCl demand had good statistics of fit (Rc2 = 0.86 and SDE = 1.24 mol-HOCl/mol-Cp, N = 159) and also showed high predictive power on cross validation data (q2LMO = 0.86, RMSELMO = 1.21 mol-Cl2/mol-Cp) and external validation data (q2ext = 0.88, RMSELMO = 1.17 mol-HOCl/mol-Cp). The QSPR also met all the criteria for QSPR predictive power and was robust. This model was integrated with AlphaStep model of natural organic matter (NOM) so as to estimate chlorine demand of surface waters. The predicted chlorine demand was 27.55 μmol-HOCl/mg-C which is comparable to 27-33 μmol-HOCl/mg-C reported for surface waters. The 4 descriptor QSPR for total organic halide (TOX) formation had Rc2 = 0.72 and SDE = 0.43 mol-Cl/mol-Cp. The Leave-One-Out validation of the QSPR (q2LOO = 0.60, RMSE = 0.5 mol-Cl/mol-Cp, N = 49) and external validation (q2Ext = 0.67, RMSE = 0.48 mol-Cl/mol-Cp, N = 12). These statistics showed that the QSPR had high predictive power and also was robust. Results from integration of the QSPR with AlphaStep predicted TOX in surface water to be 183.6 μmol-Cl/mg-C which comparable 170-298 μg-Cl/mol-Cp for the experimental TOX formation measured for whole dissolved organic matter. Trichloromethane (TCM) and trichloroacetic (TCAA) were the two specific DBPs studied. The QSPR for TCM formation had three descriptors and statistics of fit were Rc2 = 0.97 and SDE = 0.08 mol-TCM/mol-Cp and was validated by LMO data and external data. The results showed that LMO cross validation (q2LMO = 0.94, RMSE = 0.09 mol-TCM/mol-Cp, N = 90) and external validation (q2Ext = 0.94, RMSE = 0.08 mol-TCM/mol-Cp, N = 27) met criteria of predictive power and was therefore robust. The model prediction of 0.33 mol-TCM/mol-Cp was higher than 0.13 mol-TCM/mol-Cp observed for tannic acid. The QSPRs for predicting TCAA formation were developed but none of them met all the criteria for predictive power and were therefore not robust. The relationship between predicted TCAA and experimental data was too weak to be useful. This implies that TCAA formation has insignificant linear relationship with constitutional descriptors and it may better be predicted by QSPRs derived from non-linear algorithms. A major drawback of the constitutional descriptors is that they cannot explain electronic or steric effects. It is not easy to explain the differences in electron density and steric effects when same number of substituents occupy different position relative each other in aromatic ring (e.g., catechol vs. quinol). Use of geometrical descriptors (e.g., molecular volume, solvent accessible area), quantum-chemical descriptors (e.g., dipole moment, polarizability) or electrostatic descriptors (e.g., partial charge, polarity index) is recommended

    Machine learning activation energies of chemical reactions

    Get PDF
    Application of machine learning (ML) to the prediction of reaction activation barriers is a new and exciting field for these algorithms. The works covered here are specifically those in which ML is trained to predict the activation energies of homogeneous chemical reactions, where the activation energy is given by the energy difference between the reactants and transition state of a reaction. Particular attention is paid to works that have applied ML to directly predict reaction activation energies, the limitations that may be found in these studies, and where comparisons of different types of chemical features for ML models have been made. Also explored are models that have been able to obtain high predictive accuracies, but with reduced datasets, using the Gaussian process regression ML model. In these studies, the chemical reactions for which activation barriers are modeled include those involving small organic molecules, aromatic rings, and organometallic catalysts. Also provided are brief explanations of some of the most popular types of ML models used in chemistry, as a beginner's guide for those unfamiliar

    In silico prediction of pharmacokinetic properties and druglikeness of novel thiourea derivatives of naproxen

    Get PDF
    Masking the carboxyl group of naproxen with other functional groups may be a promising strategy to decrease its gastrointestinal toxicity. Thiourea moiety has been described as an important pharmacophore in a variety of pharmacologically active compounds, including anti-inflammatory, antiviral, anticancer, hypoglycemic and antimicrobial agents. Our research group has previously designed twenty novel thiourea derivatives of naproxen, containing amino acids (glycine, L-alanine, β-alanine, L-valine and L-phenylalanine - compounds 1,2,3,4 and 5, respectively), their methyl (6-10) and ethyl esters (11-15), as well as aromatic amines (16-20). Pharmacokinetic properties and druglikeness of these compounds were predicted using SwissADME web tool (http://www.swissadme.ch/). Predicted pharmacokinetic properties include potential for gastrointestinal absorption, blood-brain barrier permeability, skin permeability, transport mediated by P-glycoproteins and enzyme inhibitory potential. Druglikeness was evaluated using Lipinski’s, Ghose’s, Veber’s, Egan’s and Muegge’s rules, as well as on the basis of bioavailability score. All tested compounds had high-predicted gastrointestinal absorption and low blood-brain barrier permeability. Also, derivatives 2, 4, 7, 9, 10, 12, 14, 15 and 18 were predicted to be substrates for P-glycoprotein. Derivatives with aromatic amines (16-20) showed inhibitory potential against all tested CYP isoforms. Derivative 19 had the highest, while derivative 13 demonstrated the lowest predicted skin permeability. Finally, derivatives 1-12, except 5 and 10, have druglike structures, since they obey to all imposed rules
    • …
    corecore