2 research outputs found
Constant Size Molecular Descriptors For Use With Machine Learning
A set of molecular descriptors whose length is independent of molecular size
is developed for machine learning models that target thermodynamic and
electronic properties of molecules. These features are evaluated by monitoring
performance of kernel ridge regression models on well-studied data sets of
small organic molecules. The features include connectivity counts, which
require only the bonding pattern of the molecule, and encoded distances, which
summarize distances between both bonded and non-bonded atoms and so require the
full molecular geometry. In addition to having constant size, these features
summarize information regarding the local environment of atoms and bonds, such
that models can take advantage of similarities resulting from the presence of
similar chemical fragments across molecules. Combining these two types of
features leads to models whose performance is comparable to or better than the
current state of the art. The features introduced here have the advantage of
leading to models that may be trained on smaller molecules and then used
successfully on larger molecules.Comment: 18 pages, 5 figure
An SVD and Derivative Kernel Approach to Learning from Geometric Data
Motivated by problems such as molecular energy prediction, we derive an (improper) kernel between geometric inputs, that is able to capture the relevant rotational and translation invariances in geometric data. Since many physical simulations based upon geometric data produce derivatives of the output quantity with respect to the input positions, we derive an approach that incorporates derivative information into our kernel learning. We further show how to exploit the low rank structure of the resulting kernel matrices to speed up learning. Finally, we evaluated the method in the context of molecular energy prediction, showing good performance for modeling previously unseen molecular configurations. Integrating the approach into a Bayesian optimization, we show substantial improvement over the state of the art in molecular energy optimization