Search CORE

30 research outputs found

Prediction of the Atomization Energy of Molecules Using Coulomb Matrix and Atomic Composition in a Bayesian Regularized Neural Networks

Author: A Mauri
AK Rappé
D Xue
DJC MacKay
DJC Mackay
DJC Mackay
FR Burden
G Hugh
G Montavon
K Hansen
KA Aho
Kishore K. Reddy
LC Blum
M Rupp
M Rupp
OA Lilienfeld Von
R Guha
SJ Gorzynski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/04/2019
Field of study

Exact calculation of electronic properties of molecules is a fundamental step for intelligent and rational compounds and materials design. The intrinsically graph-like and non-vectorial nature of molecular data generates a unique and challenging machine learning problem. In this paper we embrace a learning from scratch approach where the quantum mechanical electronic properties of molecules are predicted directly from the raw molecular geometry, similar to some recent works. But, unlike these previous endeavors, our study suggests a benefit from combining molecular geometry embedded in the Coulomb matrix with the atomic composition of molecules. Using the new combined features in a Bayesian regularized neural networks, our results improve well-known results from the literature on the QM7 dataset from a mean absolute error of 3.51 kcal/mol down to 3.0 kcal/mol.Comment: Under review ICANN 201

arXiv.org e-Print Archive

Crossref

First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties

Author: von Lilienfeld O. A.
Publication venue
Publication date: 22/10/2012
Field of study

A well-defined notion of chemical compound space (CCS) is essential for gaining rigorous control of properties through variation of elemental composition and atomic configurations. Here, we review an atomistic first principles perspective on CCS. First, CCS is discussed in terms of variational nuclear charges in the context of conceptual density functional and molecular grand-canonical ensemble theory. Thereafter, we revisit the notion of compound pairs, related to each other via "alchemical" interpolations involving fractional nuclear chargens in the electronic Hamiltonian. We address Taylor expansions in CCS, property non-linearity, improved predictions using reference compound pairs, and the ounce-of-gold prize challenge to linearize CCS. Finally, we turn to machine learning of analytical structure property relationships in CCS. These relationships correspond to inferred, rather than derived through variational principle, solutions of the electronic Schr\"odinger equation

arXiv.org e-Print Archive

edoc

Recommended from our members

Towards the Quantum Machine: Using Scalable Machine Learning Methods to Predict Photovoltaic Efficacy of Organic Molecules

Author: Tingley Michael Alan
Publication venue: 'Harvard University Botany Libraries'
Publication date: 22/07/2014
Field of study

Recent advances in machine learning have resulted in an upsurge of interest in developing a “quantum machine”, a technique of simulating and predicting quantum-chemical properties on the molecular level. This paper explores the development of a large-scale quantum machine in the context of accurately and rapidly classifying molecules to determine photovoltaic efficacy through machine learning. Specifically, this paper proposes several novel representations of molecules that are amenable to learning, in addition to extending and improving existing representations. This paper also proposes and implements extensions to scalable distributed learning algorithms, in order to perform large scale molecular regression. This paper leverages Harvard’s Odyssey supercomputer in order to train various kinds of predictive algorithms over millions of molecules, and assesses cross-validated test performance of these models for predicting photovoltaic efficacy. The study suggests combinations of representations and learning models that may be most desirable in constructing a large-scale system designed to classify molecules by photovoltaic efficacy

Harvard University - DASH

Ab initio machine learning in chemical compound space

Author: Huang Bing
von Lilienfeld O. Anatole
Publication venue
Publication date: 01/01/2021
Field of study

Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first principles based virtual sampling of this space, for example in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest sub-sets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an {\em ab initio} view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics

arXiv.org e-Print Archive

edoc

PubMed Central

HAL Descartes

Quantum machine learning in chemical space

Author: Faber Felix Andreas
Publication venue
Publication date: 01/01/2019
Field of study

This thesis focus on the overlap of first principle quantum methods and machine learning in computational chemistry and materials science, commonly referred to as Quantum Machine Learning (QML). Assessing and benchmarking the performance of existing machine learning models on various classes of compounds and chemical properties is a substantial part of this thesis. These results are used to understand better which machine learning models are best suited for a given combination of properties and compounds. For example, thirteen electronic ground state properties of

\sim

131k organic molecules, calculated at hybrid-DFT level of theory, were used to gauge the predictive accuracy of combinations of representations and regressors. The out-of-sample prediction errors of the models on the hybrid-DFT quality data are on par with, or close to, the CCSD(T) error to experimental values, indicating that reference data need to go beyond hybrid-DFT if QML predictions are to surpass chemical accuracies. Another area of focus is on developing new and accurate QML models. A new representation of atoms in its chemical environment is introduced, by rethinking the way structural and chemical compound information is encoded into training data. The representation interpolates elemental properties across both atoms and compounds, making it well suited for datasets with high compositional and structural degrees of freedom. Numerical results evidence that, compared to current benchmarks, this representation yield superior predictive power in combination with kernel ridge regression on a diverse set of systems, including diverse organic molecules, non-covalently bonded protein side-chains, water clusters, and crystalline solids. Furthermore, the role of response operators when learning response properties of the energy is discussed, leading to a formalism for learning response properties of the energy by applying the corresponding response operator directly to the quantum machine learning model. Using this formalism leads to train QML models results in lower out-of-sample errors than learning the corresponding properties directly. The formalism can also be used to reproduce accurate normal modes and IR-spectra in molecules. Finally, the applicability of QML models is explored. A machine learning model which encodes the elemental identities of the atoms placed in each site, to exhaustively screen the formation energy of

\sim

2 milion Elpasolite crystals. The resulting model's accuracy improves systematically with additional training data, reaching an accuracy of 0.1 eV/atom when trained on 10k crystals. Out of the

\sim

2 million crystals, we identify 90 unique structures which span the convex hull of stability, among which NFAl

_2

_6

, with uncommon stoichiometry and a negative atomic oxidation state for Al

edoc

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

Author: Adams H.
Anderson R.
Berend Smit
Bergstra J.
Bergstra J.
Bishop C. M.
Caruana R.
Caruana R.
Chen T.
Dacrema M. F.
Daniele Ongari
Forman G.
Gilmer J.
Goodfellow I.
Grünwald P. D.
Guyon I.
Géron A.
Hardt M.
Hastie T.
Hey A. J. G.
Hofer C. D.
Ioffe S.
James G.
Kevin Maik Jablonka
Maturana D.
Molnar C.
Montgomery D. C.
Noh H.
Pedregosa F.
Pettifor D. G.
Ramsundar B.
Saul N.
Seyed Mohamad Moosavi
Shafer G.
Shalev-Shwartz S.
Smit B.
Snoek J.
Srivastava N.
Sutton R. S.
Tibshirani T.
Tomek I.
Trickett C. A.
Tukey J. W.
Vishwakarma G.
Weinberger S.
Weisberg H. F.
Weyl H.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 08/06/2020
Field of study

By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

eScholarship - University of California

Development and testing of new exchange correlation functionals

Author: Lundgård Keld Troen
Publication venue: Department of Physics, Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology