7,351 research outputs found

    Transferable atomic multipole machine learning models for small organic molecules

    Get PDF
    Accurate representation of the molecular electrostatic potential, which is often expanded in distributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any molecular conformation. The model is trained on quantum chemical results for atoms in varying chemical environments drawn from thousands of organic molecules. Multipoles in systems with neutral, cationic, and anionic molecular charge states are treated with individual models. The models' predictive accuracy and applicability are illustrated by evaluating intermolecular interaction energies of nearly 1,000 dimers and the cohesive energy of the benzene crystal.Comment: 11 pages, 6 figure

    Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning

    Get PDF
    Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically-relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions---electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters---optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically-relevant molecules. We further focus on hydrogen-bonded complexes---essential but challenging due to their directional nature---where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML in denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.Comment: 15 pages, 9 figure

    Extension of the B3LYP - Dispersion-Correcting Potential Approach to the Accurate Treatment of both Inter- and Intramolecular Interactions

    Full text link
    We recently showed that dispersion-correcting potentials (DCPs), atom-centered Gaussian-type functions developed for use with B3LYP (J. Phys. Chem. Lett. 2012, 3, 1738-1744) greatly improved the ability of the underlying functional to predict non-covalent interactions. However, the application of B3LYP-DCP for the {\beta}-scission of the cumyloxyl radical led a calculated barrier height that was over-estimated by ca. 8 kcal/mol. We show in the present work that the source of this error arises from the previously developed carbon atom DCPs, which erroneously alters the electron density in the C-C covalent-bonding region. In this work, we present a new C-DCP with a form that was expected to influence the electron density farther from the nucleus. Tests of the new C-DCP, with previously published H-, N- and O-DCPs, with B3LYP-DCP/6-31+G(2d,2p) on the S66, S22B, HSG-A, and HC12 databases of non-covalently interacting dimers showed that it is one of the most accurate methods available for treating intermolecular interactions, giving mean absolute errors (MAEs) of 0.19, 0.27, 0.16, and 0.18 kcal/mol, respectively. Additional testing on the S12L database of complexation systems gave an MAE of 2.6 kcal/mol, showing that the B3LYP-DCP/6-31+G(2d,2p) approach is one of the best-performing and feasible methods for treating large systems dominated by non-covalent interactions. Finally, we showed that C-C making/breaking chemistry is well-predicted using the newly developed DCPs. In addition to predicting a barrier height for the {\beta}-scission of the cumyloxyl radical that is within 1.7 kcal/mol of the high-level value, application of B3LYP-DCP/6-31+G(2d,2p) to 10 databases that include reaction barrier heights and energies, isomerization energies and relative conformation energies gives performance that is amongst the best of all available dispersion-corrected density-functional theory approaches

    Kernel-based machine learning for molecular crystal structure prediction

    Get PDF

    Problems, successes and challenges for the application of dispersion-corrected density-functional theory combined with dispersion-based implicit solvent models to large-scale hydrophobic self-assembly and polymorphism

    Get PDF
    © 2015 Taylor & Francis. The recent advent of dispersion-corrected density-functional theory (DFT) methods allows for quantitative modelling of molecular self-assembly processes, and we consider what is required to develop applications to the formation of large self-assembled monolayers (SAMs) on hydrophobic surfaces from organic solution. Focus is on application of the D3 dispersion correction of Grimme combined with the solvent dispersion model of Floris, Tomasi and Pascual-Ahuir to simulate observed scanning-tunnelling microscopy (STM) images of various polymorphs of tetraalkylporphyrin SAMs on highly oriented pyrolytic graphite surfaces. The most significant problem is identified as the need to treat SAM structures that are incommensurate with those of the substrate, providing a challenge to the use of traditional periodic-imaging boundary techniques. Using nearby commensurate lattices introduces non-systematic errors into calculated lattice constants and free energies of SAM formation that are larger than experimental uncertainties and polymorph differences. Developing non-periodic methods for polymorph interface simulation also remains a challenge. Despite these problems, existing methods can be used to interpret STM images and SAM atomic structures, distinguishing between multiple feasible polymorph types. They also provide critical insight into the factors controlling polymorphism. All this stems from a delicate balance that the intermolecular D3 and solvent Floris, Tomasi and Pascual-Ahuir corrections provide. Combined optimised treatments should yield fully quantitative approaches in the future
    • …
    corecore