2,951 research outputs found

    Transferable atomic multipole machine learning models for small organic molecules

    Get PDF
    Accurate representation of the molecular electrostatic potential, which is often expanded in distributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any molecular conformation. The model is trained on quantum chemical results for atoms in varying chemical environments drawn from thousands of organic molecules. Multipoles in systems with neutral, cationic, and anionic molecular charge states are treated with individual models. The models' predictive accuracy and applicability are illustrated by evaluating intermolecular interaction energies of nearly 1,000 dimers and the cohesive energy of the benzene crystal.Comment: 11 pages, 6 figure

    Accurate molecular polarizabilities with coupled-cluster theory and machine learning

    Full text link
    The molecular polarizability describes the tendency of a molecule to deform or polarize in response to an applied electric field. As such, this quantity governs key intra- and inter-molecular interactions such as induction and dispersion, plays a key role in determining the spectroscopic signatures of molecules, and is an essential ingredient in polarizable force fields and other empirical models for collective interactions. Compared to other ground-state properties, an accurate and reliable prediction of the molecular polarizability is considerably more difficult as this response quantity is quite sensitive to the description of the underlying molecular electronic structure. In this work, we present state-of-the-art quantum mechanical calculations of the static dipole polarizability tensors of 7,211 small organic molecules computed using linear-response coupled-cluster singles and doubles theory (LR-CCSD). Using a symmetry-adapted machine-learning based approach, we demonstrate that it is possible to predict the molecular polarizability with LR-CCSD accuracy at a negligible computational cost. The employed model is quite robust and transferable, yielding molecular polarizabilities for a diverse set of 52 larger molecules (which includes challenging conjugated systems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon isomers) at an accuracy that exceeds that of hybrid density functional theory (DFT). The atom-centered decomposition implicit in our machine-learning approach offers some insight into the shortcomings of DFT in the prediction of this fundamental quantity of interest

    Genetic optimization of training sets for improved machine learning models of molecular properties

    Get PDF
    The training of molecular models of quantum mechanical properties based on statistical machine learning requires large datasets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve as prior knowledge may be sparse or unavailable. Ordinarily representative selection of training molecules from such datasets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate with respect to small randomly selected training sets: mean absolute errors for out-of-sample predictions are reduced to ~25% for enthalpies, free energies, and zero-point vibrational energy, to ~50% for heat-capacity, electron-spread, and polarizability, and by more than ~20% for electronic properties such as frontier orbital eigenvalues or dipole-moments. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure

    Electronic Descriptors for Supervised Spectroscopic Predictions

    Get PDF
    Spectroscopic properties of molecules holds great importance for the description of the molecular response under the effect of an UV/Vis electromagnetic radiation. Computationally expensive ab initio (e.g. MultiConfigurational SCF, Coupled Cluster) or TDDFT methods are commonly used by the quantum chemistry community to compute these properties. In this work, we propose a (supervised) Machine Learning approach to model the absorption spectra of organic molecules. Several supervised ML methods have been tested such as Kernel Ridge Regression (KRR), Multiperceptron Neural Networs (MLP) and Convolutional Neural Networks. The use of only geometrical descriptors (e.g. Coulomb Matrix) proved to be insufficient for an accurate training. Inspired on the TDDFT theory, we propose to use a set of electronic descriptors obtained from low-cost DFT methods: orbital energy differences, transition dipole moment between occupied and unoccupied Kohn-Sham orbitals and charge-transfer character of mono-excitations. We demonstrate that with this electronic descriptors and the use of Neural Networks we can predict not only a density of excited states, but also getting very good estimation of the absorption spectrum and charge-transfer character of the electronic excited states, reaching results close to the chemical accuracy (~2 kcal/mol or ~0.1eV)
    corecore