30 research outputs found

    Prediction of the Atomization Energy of Molecules Using Coulomb Matrix and Atomic Composition in a Bayesian Regularized Neural Networks

    Full text link
    Exact calculation of electronic properties of molecules is a fundamental step for intelligent and rational compounds and materials design. The intrinsically graph-like and non-vectorial nature of molecular data generates a unique and challenging machine learning problem. In this paper we embrace a learning from scratch approach where the quantum mechanical electronic properties of molecules are predicted directly from the raw molecular geometry, similar to some recent works. But, unlike these previous endeavors, our study suggests a benefit from combining molecular geometry embedded in the Coulomb matrix with the atomic composition of molecules. Using the new combined features in a Bayesian regularized neural networks, our results improve well-known results from the literature on the QM7 dataset from a mean absolute error of 3.51 kcal/mol down to 3.0 kcal/mol.Comment: Under review ICANN 201

    First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties

    Full text link
    A well-defined notion of chemical compound space (CCS) is essential for gaining rigorous control of properties through variation of elemental composition and atomic configurations. Here, we review an atomistic first principles perspective on CCS. First, CCS is discussed in terms of variational nuclear charges in the context of conceptual density functional and molecular grand-canonical ensemble theory. Thereafter, we revisit the notion of compound pairs, related to each other via "alchemical" interpolations involving fractional nuclear chargens in the electronic Hamiltonian. We address Taylor expansions in CCS, property non-linearity, improved predictions using reference compound pairs, and the ounce-of-gold prize challenge to linearize CCS. Finally, we turn to machine learning of analytical structure property relationships in CCS. These relationships correspond to inferred, rather than derived through variational principle, solutions of the electronic Schr\"odinger equation

    Ab initio machine learning in chemical compound space

    Get PDF
    Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first principles based virtual sampling of this space, for example in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest sub-sets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an {\em ab initio} view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics

    Quantum machine learning in chemical space

    Get PDF
    This thesis focus on the overlap of first principle quantum methods and machine learning in computational chemistry and materials science, commonly referred to as Quantum Machine Learning (QML). Assessing and benchmarking the performance of existing machine learning models on various classes of compounds and chemical properties is a substantial part of this thesis. These results are used to understand better which machine learning models are best suited for a given combination of properties and compounds. For example, thirteen electronic ground state properties of ∌\sim131k organic molecules, calculated at hybrid-DFT level of theory, were used to gauge the predictive accuracy of combinations of representations and regressors. The out-of-sample prediction errors of the models on the hybrid-DFT quality data are on par with, or close to, the CCSD(T) error to experimental values, indicating that reference data need to go beyond hybrid-DFT if QML predictions are to surpass chemical accuracies. Another area of focus is on developing new and accurate QML models. A new representation of atoms in its chemical environment is introduced, by rethinking the way structural and chemical compound information is encoded into training data. The representation interpolates elemental properties across both atoms and compounds, making it well suited for datasets with high compositional and structural degrees of freedom. Numerical results evidence that, compared to current benchmarks, this representation yield superior predictive power in combination with kernel ridge regression on a diverse set of systems, including diverse organic molecules, non-covalently bonded protein side-chains, water clusters, and crystalline solids. Furthermore, the role of response operators when learning response properties of the energy is discussed, leading to a formalism for learning response properties of the energy by applying the corresponding response operator directly to the quantum machine learning model. Using this formalism leads to train QML models results in lower out-of-sample errors than learning the corresponding properties directly. The formalism can also be used to reproduce accurate normal modes and IR-spectra in molecules. Finally, the applicability of QML models is explored. A machine learning model which encodes the elemental identities of the atoms placed in each site, to exhaustively screen the formation energy of ∌\sim2 milion Elpasolite crystals. The resulting model's accuracy improves systematically with additional training data, reaching an accuracy of 0.1 eV/atom when trained on 10k crystals. Out of the ∌\sim2 million crystals, we identify 90 unique structures which span the convex hull of stability, among which NFAl2_2Ca6_6, with uncommon stoichiometry and a negative atomic oxidation state for Al

    Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

    Full text link
    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

    Development and testing of new exchange correlation functionals

    Get PDF
    corecore