166 research outputs found

    Fast, accurate, and transferable many-body interatomic potentials by symbolic regression

    Full text link
    The length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme time and length scales. To address this challenge, we have developed a machine learning algorithm based on symbolic regression in the form of genetic programming that is capable of discovering accurate, computationally efficient manybody potential models. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and select models within this hypothesis space based on their accuracy, speed, and simplicity. The focus on simplicity reduces the risk of overfitting the training data and increases the chances of discovering a model that generalizes well. Our algorithm was validated by rediscovering an exact Lennard-Jones potential and a Sutton Chen embedded atom method potential from training data generated using these models. By using training data generated from density functional theory calculations, we found potential models for elemental copper that are simple, as fast as embedded atom models, and capable of accurately predicting properties outside of their training set. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed

    Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data

    Full text link
    Particle-based modeling of materials at atomic scale plays an important role in the development of new materials and understanding of their properties. The accuracy of particle simulations is determined by interatomic potentials, which allow to calculate the potential energy of an atomic system as a function of atomic coordinates and potentially other properties. First-principles-based ab initio potentials can reach arbitrary levels of accuracy, however their aplicability is limited by their high computational cost. Machine learning (ML) has recently emerged as an effective way to offset the high computational costs of ab initio atomic potentials by replacing expensive models with highly efficient surrogates trained on electronic structure data. Among a plethora of current methods, symbolic regression (SR) is gaining traction as a powerful "white-box" approach for discovering functional forms of interatomic potentials. This contribution discusses the role of symbolic regression in Materials Science (MS) and offers a comprehensive overview of current methodological challenges and state-of-the-art results. A genetic programming-based approach for modeling atomic potentials from raw data (consisting of snapshots of atomic positions and associated potential energy) is presented and empirically validated on ab initio electronic structure data.Comment: Submitted to the GPTP XIX Workshop, June 2-4 2022, University of Michigan, Ann Arbor, Michiga

    Generalizability of Functional Forms for Interatomic Potential Models Discovered by Symbolic Regression

    Full text link
    In recent years there has been great progress in the use of machine learning algorithms to develop interatomic potential models. Machine-learned potential models are typically orders of magnitude faster than density functional theory but also orders of magnitude slower than physics-derived models such as the embedded atom method. In our previous work, we used symbolic regression to develop fast, accurate and transferrable interatomic potential models for copper with novel functional forms that resemble those of the embedded atom method. To determine the extent to which the success of these forms was specific to copper, here we explore the generalizability of these models to other face-centered cubic transition metals and analyze their out-of-sample performance on several material properties. We found that these forms work particularly well on elements that are chemically similar to copper. When compared to optimized Sutton-Chen models, which have similar complexity, the functional forms discovered using symbolic regression perform better across all elements considered except gold where they have a similar performance. They perform similarly to a moderately more complex embedded atom form on properties on which they were trained, and they are more accurate on average on other properties. We attribute this improved generalized accuracy to the relative simplicity of the models discovered using symbolic regression. The genetic programming models are found to outperform other models from the literature about 50% of the time in a variety of property predictions, with about 1/10th the model complexity on average. We discuss the implications of these results to the broader application of symbolic regression to the development of new potentials and highlight how models discovered for one element can be used to seed new searches for different elements

    LEVERAGING INFORMATICS FOR ACCELERATING THE DISCOVERY OF MATERIALS

    Get PDF
    The application of materials informatics for the rational design of materials has been inspired by the increasing number of examples of success of machine learning in many fields, and it has been facilitated by the greater access to computational resources, the advances in algorithms and the growing open-source code community. This thesis presents two ways in which we have advanced the field of computational materials science through materials informatics. A promising application of materials informatics to materials science is the development of machine-learned interatomic potentials models that are orders of magnitude faster than ab initio methods such as density functional theory and can be nearly as accurate. However, these models are typically orders of magnitude slower than physics-derived models such as the embedded atom method (EAM), and they usually do not generalize well. We present a supervised machine learning approach for developing interatomic potential models to simulate atomic systems at large time and length scales from ab initio data. The models developed with our symbolic regression algorithm are computationally fast, simple (and interpretable), accurate, and transferrable. A reason for the success of our algorithm is that it learns models using a physics-informed hypothesis space. Another important component of our algorithm is the minimization of a multi-objective cost function to search simple, accurate and fast interatomic potential models. We first demonstrate our approach for elemental Cu, and then show how the models discovered for Cu transfer well to other fcc transition metals close to Cu on the periodic table. Then, we demonstrate how our algorithm can be used to discover new functional forms for the fcc transition metals close to Cu on the periodic table, benefiting from the information encoded in known models as a seed to the search. The machine learning interatomic potential models developed with our approach are 2-3 orders of magnitude faster than other machine learned potentials, they are on average one order of magnitude simpler than EAM-type models, and their transferability is at least as good as that of other EAM-type models. In addition, their simplicity opens the door for studying their functional forms to possibly gain insights into the atomic systems. This thesis also addresses the need for a database of atomically precise nanoclusters at the density functional theory level of accuracy. Our approach used a genetic algorithm to identify low-energy clusters, and to our knowledge, it constitutes the largest database of atomically precise nanoclusters at the level of accuracy of density functional theory. This database can inform studies that aim to design clusters for a variety of applications, it can be used to train machine learning models, or it can be used as a benchmark for other studies

    Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example

    Full text link
    Machine learning (ML) is widely used to explore crystal materials and predict their properties. However, the training is time-consuming for deep-learning models, and the regression process is a black box that is hard to interpret. Also, the preprocess to transfer a crystal structure into the input of ML, called descriptor, needs to be designed carefully. To efficiently predict important properties of materials, we propose an approach based on ensemble learning consisting of regression trees to predict formation energy and elastic constants based on small-size datasets of carbon allotropes as an example. Without using any descriptor, the inputs are the properties calculated by molecular dynamics with 9 different classical interatomic potentials. Overall, the results from ensemble learning are more accurate than those from classical interatomic potentials, and ensemble learning can capture the relatively accurate properties from the 9 classical potentials as criteria for predicting the final properties

    Efficient Gaussian Process Regression for prediction of molecular crystals harmonic free energies

    Get PDF
    We present a method to accurately predict the Helmholtz harmonic free energies of molecular crystals in high-throughput settings. This is achieved by devising a computationally efficient framework that employs a Gaussian Process Regression model based on local atomic environments. The cost to train the model with ab initio potentials is reduced by starting the optimisation of the framework parameters, as well as the training and validation sets, with an empirical potential. This is then transferred to train the model based on density-functional theory potentials, including dispersion-corrections. We benchmarked our framework on a set of 444 hydrocarbon crystal structures, comprising 38 polymorphs, and 406 crystal structures either measured in different conditions or derived from them. Superior performance and high prediction accuracy, with mean absolute deviation below 0.04 kJ/mol/atom at 300 K is achieved by training on as little as 60 crystal structures. Furthermore, we demonstrate the predictive efficiency and accuracy of the developed framework by successfully calculating the thermal lattice expansion of aromatic hydrocarbon crystals within the quasi-harmonic approximation, and predict how lattice expansion affects the polymorph stability ranking.Comment: 12 pages, 5 figure

    Computational characterization and prediction of metal-organic framework properties

    Full text link
    In this introductory review, we give an overview of the computational chemistry methods commonly used in the field of metal-organic frameworks (MOFs), to describe or predict the structures themselves and characterize their various properties, either at the quantum chemical level or through classical molecular simulation. We discuss the methods for the prediction of crystal structures, geometrical properties and large-scale screening of hypothetical MOFs, as well as their thermal and mechanical properties. A separate section deals with the simulation of adsorption of fluids and fluid mixtures in MOFs
    corecore