23 research outputs found
Rapid estimation of catalytic efficiency by cumulative atomic multipole moments:application to ketosteroid isomerase mutants
We
propose a simple atomic multipole electrostatic model to rapidly
evaluate the effects of mutation on enzyme activity and test its performance
on wild-type and mutant ketosteroid isomerase. The predictions of
our atomic multipole model are similar to those obtained with symmetry-adapted
perturbation theory at a fraction of the computational cost. We further
show that this approach is relatively insensitive to the precise amino
acid side chain conformation in mutants and may thus be useful in
computational enzyme (re)design
Machine Learning May Sometimes Simply Capture LiteraturePopularity Trends: A Case Study of Heterocyclic Suzuki-MiyauraCoupling br
Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers ofliterature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paperdemonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki-Miyaura coupling with heterocyclic building blocks & xe0d5;and a carefully selected database of >10,000 literature examples & xe0d5;we show thatML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to onlysolvents and bases. This result holds irrespective of the ML model applied (from simple feed-forward to state-of-the-art graph-convolution neural networks) or the representation to describe the reaction partners (variousfingerprints, chemical descriptors,latent representations, etc.). In all cases, the ML methods fail to perform significantly better than naive assignments based on thesheer frequency of certain reaction conditions reported in the literature. These unsatisfactory results likely reflect subjectivepreferences of various chemists to use certain protocols, other biasing factors as mundane as availability of certain solvents/reagents,and/or a lack of negative data. Thesefindings highlight the likely importance of systematically generating reliable and standardizeddata sets for algorithm training
Computational planning of the synthesis of complex natural products
Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years(1-7). However, the field has progressed greatly since the development of early programs such as LHASA(1,7), for which reaction choices at each step were made by human operators. Multiple software platforms(6,8-14) are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary(15,16) and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships(17,18), allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts
Rapid and Accurate Prediction of pK(a) Values of C-H Acids Using Graph Convolutional Neural Networks
The ability to estimate the acidity of C-H groups within organic molecules in non-aqueous solvents is important in synthetic planning to correctly predict which protons will be abstracted in reactions such as alkylations, Michael additions, or aldol condensations. This Article describes the use of the so-called graph convolutional neural networks (GCNNs) to perform such predictions on the time scales of milliseconds and with accuracy comparing favorably with state-of-the-art solutions,. including commercial ones. The crux of the method is to train GCNNs using descriptors that reflect not only topological but also chemical properties of atomic environments. The model is validated against adversarial controls, supplemented by the discussion of realistic synthetic problems (on which it correctly predicts the most acidic protons in >90% of cases), and accompanied by a Web application intended to aid the community in everyday synthetic planning
Scaffold-Directed Face Selectivity Machine-Learned from Vectors of Non-covalent Interactions
This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms
Prediction of major regio-, site-, and diastereoisomers in Diels-Alder reactions using machine-learning: The importance of physically meaningful descriptors
Machine learning can predict the major regio-, site-, and diastereoselective outcomes of Diels-Alder reactions better than standard quantum-mechanical methods and with accuracies exceeding 90% provided that (i) the diene/dienophile substrates are represented by “physical-organic” descriptors reflecting the electronic and steric characteristics of their substituents and (ii) the positions of such substituents relative to the reaction core are encoded (“vectorized“) in an informative way. (c) 2013 Wiley-VCH verlag Gmbh & Co. KGaA, Weinhei
Bottom-Up Nonempirical Approach To Reducing Search Space in Enzyme Design Guided by Catalytic Fields
Robust Predictive Power of the Electrostatic Term at Shortened Intermolecular Distances
At distances shorter than equilibrium, electrostatic
interactions
seem to be a more robust indicator of relative molecular dimer stability
than more accurate electronic structure approaches. We arrive at this
conclusion by investigating the nonparametric correlation between
reference interaction energies at equilibrium geometries (coupled
cluster with singles, doubles, and perturbative triples at the complete
basis set limit, Δ<i>E</i><sub>CCSD(T)</sub><sup>CBS,ref</sup>) and its various approximate values
obtained at a range of distances for a training set of 22 biologically
relevant dimers. The reference and other costly methods start to fail
to reproduce the equilibrium ranking of dimer stabilities when the
intermolecular distance is shortened by more than 0.2 Å, but
the full electrostatic component (includes penetration) maintains
a high success rate. Such trends provide a new perspective for any
applications where inaccurate structures are used out of necessity,
such as the scoring of ligands docked to enzyme active sites
Robust Predictive Power of the Electrostatic Term at Shortened Intermolecular Distances
At distances shorter than equilibrium, electrostatic
interactions
seem to be a more robust indicator of relative molecular dimer stability
than more accurate electronic structure approaches. We arrive at this
conclusion by investigating the nonparametric correlation between
reference interaction energies at equilibrium geometries (coupled
cluster with singles, doubles, and perturbative triples at the complete
basis set limit, Δ<i>E</i><sub>CCSD(T)</sub><sup>CBS,ref</sup>) and its various approximate values
obtained at a range of distances for a training set of 22 biologically
relevant dimers. The reference and other costly methods start to fail
to reproduce the equilibrium ranking of dimer stabilities when the
intermolecular distance is shortened by more than 0.2 Å, but
the full electrostatic component (includes penetration) maintains
a high success rate. Such trends provide a new perspective for any
applications where inaccurate structures are used out of necessity,
such as the scoring of ligands docked to enzyme active sites
Atomic polarization justified Fukui indices and the affinity indicators in aromatic heterocycles and nucleobases
Atomic Fukui indices have been calculated by integration of the polarization justified Fukui functions over the atomic basins. Resulting indices have been explored in the definition of the atomic and group affinity indicator and softnesses on the ground of the formal analysis of the polarization effect. These indicators combine the effect of the atomic charge and atomic Fukui index. They are potentially applicable in testing a sensing effect on a molecule induced by an approaching point agent, nucleophilic (−) or electrophilic (+), at a distance in the order of v.d. Waals radii. Calculated atomic and group affinity and softness indicators have been proved to be consistent with the well established trends of reactivity for a control group of the five-atom-ring heterocycles (imidazole, oxazole, thiazole). The indices have been applied to the set of 5 nucleobases (adenine, guanine, cytosine, thymine, uracyl), whose diverse reactivity towards electrophiles has been recognized as a key factor determining the sensitivity of DNA to cytotoxic agents. The pairing effect of the nucleobases bases in the DNA chain and the experimental trends of the site reactivity of these molecules have been properly accounted for by the calculated indicators