30 research outputs found
Prediction of the Atomization Energy of Molecules Using Coulomb Matrix and Atomic Composition in a Bayesian Regularized Neural Networks
Exact calculation of electronic properties of molecules is a fundamental step
for intelligent and rational compounds and materials design. The intrinsically
graph-like and non-vectorial nature of molecular data generates a unique and
challenging machine learning problem. In this paper we embrace a learning from
scratch approach where the quantum mechanical electronic properties of
molecules are predicted directly from the raw molecular geometry, similar to
some recent works. But, unlike these previous endeavors, our study suggests a
benefit from combining molecular geometry embedded in the Coulomb matrix with
the atomic composition of molecules. Using the new combined features in a
Bayesian regularized neural networks, our results improve well-known results
from the literature on the QM7 dataset from a mean absolute error of 3.51
kcal/mol down to 3.0 kcal/mol.Comment: Under review ICANN 201
First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties
A well-defined notion of chemical compound space (CCS) is essential for
gaining rigorous control of properties through variation of elemental
composition and atomic configurations. Here, we review an atomistic first
principles perspective on CCS. First, CCS is discussed in terms of variational
nuclear charges in the context of conceptual density functional and molecular
grand-canonical ensemble theory. Thereafter, we revisit the notion of compound
pairs, related to each other via "alchemical" interpolations involving
fractional nuclear chargens in the electronic Hamiltonian. We address Taylor
expansions in CCS, property non-linearity, improved predictions using reference
compound pairs, and the ounce-of-gold prize challenge to linearize CCS.
Finally, we turn to machine learning of analytical structure property
relationships in CCS. These relationships correspond to inferred, rather than
derived through variational principle, solutions of the electronic
Schr\"odinger equation
Recommended from our members
Towards the Quantum Machine: Using Scalable Machine Learning Methods to Predict Photovoltaic Efficacy of Organic Molecules
Recent advances in machine learning have resulted in an upsurge of interest in developing a âquantum machineâ, a technique of simulating and predicting quantum-chemical properties on the molecular level. This paper explores the development of a large-scale quantum machine in the context of accurately and rapidly classifying molecules to determine photovoltaic efficacy through machine learning. Specifically, this paper proposes several novel representations of molecules that are amenable to learning, in addition to extending and improving existing representations. This paper also proposes and implements extensions to scalable distributed learning algorithms, in order to perform large scale molecular regression. This paper leverages Harvardâs Odyssey supercomputer in order to train various kinds of predictive algorithms over millions of molecules, and assesses cross-validated test performance of these models for predicting photovoltaic efficacy. The study suggests combinations of representations and learning models that may be most desirable in constructing a large-scale system designed to classify molecules by photovoltaic efficacy
Ab initio machine learning in chemical compound space
Chemical compound space (CCS), the set of all theoretically conceivable
combinations of chemical elements and (meta-)stable geometries that make up
matter, is colossal. The first principles based virtual sampling of this space,
for example in search of novel molecules or materials which exhibit desirable
properties, is therefore prohibitive for all but the smallest sub-sets and
simplest properties. We review studies aimed at tackling this challenge using
modern machine learning techniques based on (i) synthetic data, typically
generated using quantum mechanics based methods, and (ii) model architectures
inspired by quantum mechanics. Such Quantum mechanics based Machine Learning
(QML) approaches combine the numerical efficiency of statistical surrogate
models with an {\em ab initio} view on matter. They rigorously reflect the
underlying physics in order to reach universality and transferability across
CCS. While state-of-the-art approximations to quantum problems impose severe
computational bottlenecks, recent QML based developments indicate the
possibility of substantial acceleration without sacrificing the predictive
power of quantum mechanics
Quantum machine learning in chemical space
This thesis focus on the overlap of first principle quantum methods and machine learning in computational chemistry and materials science, commonly referred to as Quantum Machine Learning (QML).
Assessing and benchmarking the performance of existing machine learning models on various classes of compounds and chemical properties is a substantial part of this thesis. These results are used to understand better which machine learning models are best suited for a given combination of properties and compounds. For example, thirteen electronic ground state properties of 131k organic molecules, calculated at hybrid-DFT level of theory, were used to gauge the predictive accuracy of combinations of representations and regressors. The out-of-sample prediction errors of the models on the hybrid-DFT quality data are on par with, or close to, the CCSD(T) error to experimental values, indicating that reference data need to go beyond hybrid-DFT if QML predictions are to surpass chemical accuracies.
Another area of focus is on developing new and accurate QML models. A new representation of atoms in its chemical environment is introduced, by rethinking the way structural and chemical compound information is encoded into training data. The representation interpolates elemental properties across both atoms and compounds, making it well suited for datasets with high compositional and structural degrees of freedom. Numerical results evidence that, compared to current benchmarks, this representation yield superior predictive power in combination with kernel ridge regression on a diverse set of systems, including diverse organic molecules, non-covalently bonded protein side-chains, water clusters, and crystalline solids. Furthermore, the role of response operators when learning response properties of the energy is discussed, leading to a formalism for learning response properties of the energy by applying the corresponding response operator directly to the quantum machine learning model. Using this formalism leads to train QML models results in lower out-of-sample errors than learning the corresponding properties directly. The formalism can also be used to reproduce accurate normal modes and IR-spectra in molecules.
Finally, the applicability of QML models is explored. A machine learning model which encodes the elemental identities of the atoms placed in each site, to exhaustively screen the formation energy of 2 milion Elpasolite crystals. The resulting model's accuracy improves systematically with additional training data, reaching an accuracy of 0.1 eV/atom when trained on 10k crystals. Out of the 2 million crystals, we identify 90 unique structures which span the convex hull of stability, among which NFAlCa, with uncommon stoichiometry and a negative atomic oxidation state for Al
Big-Data Science in Porous Materials: Materials Genomics and Machine Learning
By combining metal nodes with organic linkers we can potentially synthesize
millions of possible metal organic frameworks (MOFs). At present, we have
libraries of over ten thousand synthesized materials and millions of in-silico
predicted materials. The fact that we have so many materials opens many
exciting avenues to tailor make a material that is optimal for a given
application. However, from an experimental and computational point of view we
simply have too many materials to screen using brute-force techniques. In this
review, we show that having so many materials allows us to use big-data methods
as a powerful technique to study these materials and to discover complex
correlations. The first part of the review gives an introduction to the
principles of big-data science. We emphasize the importance of data collection,
methods to augment small data sets, how to select appropriate training sets. An
important part of this review are the different approaches that are used to
represent these materials in feature space. The review also includes a general
overview of the different ML techniques, but as most applications in porous
materials use supervised ML our review is focused on the different approaches
for supervised ML. In particular, we review the different method to optimize
the ML process and how to quantify the performance of the different methods. In
the second part, we review how the different approaches of ML have been applied
to porous materials. In particular, we discuss applications in the field of gas
storage and separation, the stability of these materials, their electronic
properties, and their synthesis. The range of topics illustrates the large
variety of topics that can be studied with big-data science. Given the
increasing interest of the scientific community in ML, we expect this list to
rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures