2,951 research outputs found
Transferable atomic multipole machine learning models for small organic molecules
Accurate representation of the molecular electrostatic potential, which is
often expanded in distributed multipole moments, is crucial for an efficient
evaluation of intermolecular interactions. Here we introduce a machine learning
model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any
molecular conformation. The model is trained on quantum chemical results for
atoms in varying chemical environments drawn from thousands of organic
molecules. Multipoles in systems with neutral, cationic, and anionic molecular
charge states are treated with individual models. The models' predictive
accuracy and applicability are illustrated by evaluating intermolecular
interaction energies of nearly 1,000 dimers and the cohesive energy of the
benzene crystal.Comment: 11 pages, 6 figure
Machine Learning, Quantum Mechanics, and Chemical Compound Space
We review recent studies dealing with the generation of machine learning
models of molecular and solid properties. The models are trained and validated
using standard quantum chemistry results obtained for organic molecules and
materials selected from chemical space at random
Accurate molecular polarizabilities with coupled-cluster theory and machine learning
The molecular polarizability describes the tendency of a molecule to deform
or polarize in response to an applied electric field. As such, this quantity
governs key intra- and inter-molecular interactions such as induction and
dispersion, plays a key role in determining the spectroscopic signatures of
molecules, and is an essential ingredient in polarizable force fields and other
empirical models for collective interactions. Compared to other ground-state
properties, an accurate and reliable prediction of the molecular polarizability
is considerably more difficult as this response quantity is quite sensitive to
the description of the underlying molecular electronic structure. In this work,
we present state-of-the-art quantum mechanical calculations of the static
dipole polarizability tensors of 7,211 small organic molecules computed using
linear-response coupled-cluster singles and doubles theory (LR-CCSD). Using a
symmetry-adapted machine-learning based approach, we demonstrate that it is
possible to predict the molecular polarizability with LR-CCSD accuracy at a
negligible computational cost. The employed model is quite robust and
transferable, yielding molecular polarizabilities for a diverse set of 52
larger molecules (which includes challenging conjugated systems, carbohydrates,
small drugs, amino acids, nucleobases, and hydrocarbon isomers) at an accuracy
that exceeds that of hybrid density functional theory (DFT). The atom-centered
decomposition implicit in our machine-learning approach offers some insight
into the shortcomings of DFT in the prediction of this fundamental quantity of
interest
Genetic optimization of training sets for improved machine learning models of molecular properties
The training of molecular models of quantum mechanical properties based on
statistical machine learning requires large datasets which exemplify the map
from chemical structure to molecular property. Intelligent a priori selection
of training examples is often difficult or impossible to achieve as prior
knowledge may be sparse or unavailable. Ordinarily representative selection of
training molecules from such datasets is achieved through random sampling. We
use genetic algorithms for the optimization of training set composition
consisting of tens of thousands of small organic molecules. The resulting
machine learning models are considerably more accurate with respect to small
randomly selected training sets: mean absolute errors for out-of-sample
predictions are reduced to ~25% for enthalpies, free energies, and zero-point
vibrational energy, to ~50% for heat-capacity, electron-spread, and
polarizability, and by more than ~20% for electronic properties such as
frontier orbital eigenvalues or dipole-moments. We discuss and present
optimized training sets consisting of 10 molecular classes for all molecular
properties studied. We show that these classes can be used to design improved
training sets for the generation of machine learning models of the same
properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure
Electronic Descriptors for Supervised Spectroscopic Predictions
Spectroscopic properties of molecules holds great importance for the description of the molecular response under the effect of an UV/Vis electromagnetic radiation. Computationally expensive ab initio (e.g. MultiConfigurational SCF, Coupled Cluster) or TDDFT methods are commonly used by the quantum chemistry community to compute these properties. In this work, we propose a (supervised) Machine Learning approach to model the absorption spectra of organic molecules. Several supervised ML methods have been tested such as Kernel Ridge Regression (KRR), Multiperceptron Neural Networs (MLP) and Convolutional Neural Networks. The use of only geometrical descriptors (e.g. Coulomb Matrix) proved to be insufficient for an accurate training. Inspired on the TDDFT theory, we propose to use a set of electronic descriptors obtained from low-cost DFT methods: orbital energy differences, transition dipole moment between occupied and unoccupied Kohn-Sham orbitals and charge-transfer character of mono-excitations. We demonstrate that with this electronic descriptors and the use of Neural Networks we can predict not only a density of excited states, but also getting very good estimation of the absorption spectrum and charge-transfer character of the electronic excited states, reaching results close to the chemical accuracy (~2 kcal/mol or ~0.1eV)
- …