2 research outputs found

    Constant Size Molecular Descriptors For Use With Machine Learning

    Full text link
    A set of molecular descriptors whose length is independent of molecular size is developed for machine learning models that target thermodynamic and electronic properties of molecules. These features are evaluated by monitoring performance of kernel ridge regression models on well-studied data sets of small organic molecules. The features include connectivity counts, which require only the bonding pattern of the molecule, and encoded distances, which summarize distances between both bonded and non-bonded atoms and so require the full molecular geometry. In addition to having constant size, these features summarize information regarding the local environment of atoms and bonds, such that models can take advantage of similarities resulting from the presence of similar chemical fragments across molecules. Combining these two types of features leads to models whose performance is comparable to or better than the current state of the art. The features introduced here have the advantage of leading to models that may be trained on smaller molecules and then used successfully on larger molecules.Comment: 18 pages, 5 figure

    An SVD and Derivative Kernel Approach to Learning from Geometric Data

    No full text
    Motivated by problems such as molecular energy prediction, we derive an (improper) kernel between geometric inputs, that is able to capture the relevant rotational and translation invariances in geometric data. Since many physical simulations based upon geometric data produce derivatives of the output quantity with respect to the input positions, we derive an approach that incorporates derivative information into our kernel learning. We further show how to exploit the low rank structure of the resulting kernel matrices to speed up learning. Finally, we evaluated the method in the context of molecular energy prediction, showing good performance for modeling previously unseen molecular configurations. Integrating the approach into a Bayesian optimization, we show substantial improvement over the state of the art in molecular energy optimization
    corecore