166 research outputs found
Fast, accurate, and transferable many-body interatomic potentials by symbolic regression
The length and time scales of atomistic simulations are limited by the
computational cost of the methods used to predict material properties. In
recent years there has been great progress in the use of machine learning
algorithms to develop fast and accurate interatomic potential models, but it
remains a challenge to develop models that generalize well and are fast enough
to be used at extreme time and length scales. To address this challenge, we
have developed a machine learning algorithm based on symbolic regression in the
form of genetic programming that is capable of discovering accurate,
computationally efficient manybody potential models. The key to our approach is
to explore a hypothesis space of models based on fundamental physical
principles and select models within this hypothesis space based on their
accuracy, speed, and simplicity. The focus on simplicity reduces the risk of
overfitting the training data and increases the chances of discovering a model
that generalizes well. Our algorithm was validated by rediscovering an exact
Lennard-Jones potential and a Sutton Chen embedded atom method potential from
training data generated using these models. By using training data generated
from density functional theory calculations, we found potential models for
elemental copper that are simple, as fast as embedded atom models, and capable
of accurately predicting properties outside of their training set. Our approach
requires relatively small sets of training data, making it possible to generate
training data using highly accurate methods at a reasonable computational cost.
We present our approach, the forms of the discovered models, and assessments of
their transferability, accuracy and speed
Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data
Particle-based modeling of materials at atomic scale plays an important role
in the development of new materials and understanding of their properties. The
accuracy of particle simulations is determined by interatomic potentials, which
allow to calculate the potential energy of an atomic system as a function of
atomic coordinates and potentially other properties. First-principles-based ab
initio potentials can reach arbitrary levels of accuracy, however their
aplicability is limited by their high computational cost.
Machine learning (ML) has recently emerged as an effective way to offset the
high computational costs of ab initio atomic potentials by replacing expensive
models with highly efficient surrogates trained on electronic structure data.
Among a plethora of current methods, symbolic regression (SR) is gaining
traction as a powerful "white-box" approach for discovering functional forms of
interatomic potentials.
This contribution discusses the role of symbolic regression in Materials
Science (MS) and offers a comprehensive overview of current methodological
challenges and state-of-the-art results. A genetic programming-based approach
for modeling atomic potentials from raw data (consisting of snapshots of atomic
positions and associated potential energy) is presented and empirically
validated on ab initio electronic structure data.Comment: Submitted to the GPTP XIX Workshop, June 2-4 2022, University of
Michigan, Ann Arbor, Michiga
Generalizability of Functional Forms for Interatomic Potential Models Discovered by Symbolic Regression
In recent years there has been great progress in the use of machine learning
algorithms to develop interatomic potential models. Machine-learned potential
models are typically orders of magnitude faster than density functional theory
but also orders of magnitude slower than physics-derived models such as the
embedded atom method. In our previous work, we used symbolic regression to
develop fast, accurate and transferrable interatomic potential models for
copper with novel functional forms that resemble those of the embedded atom
method. To determine the extent to which the success of these forms was
specific to copper, here we explore the generalizability of these models to
other face-centered cubic transition metals and analyze their out-of-sample
performance on several material properties. We found that these forms work
particularly well on elements that are chemically similar to copper. When
compared to optimized Sutton-Chen models, which have similar complexity, the
functional forms discovered using symbolic regression perform better across all
elements considered except gold where they have a similar performance. They
perform similarly to a moderately more complex embedded atom form on properties
on which they were trained, and they are more accurate on average on other
properties. We attribute this improved generalized accuracy to the relative
simplicity of the models discovered using symbolic regression. The genetic
programming models are found to outperform other models from the literature
about 50% of the time in a variety of property predictions, with about 1/10th
the model complexity on average. We discuss the implications of these results
to the broader application of symbolic regression to the development of new
potentials and highlight how models discovered for one element can be used to
seed new searches for different elements
LEVERAGING INFORMATICS FOR ACCELERATING THE DISCOVERY OF MATERIALS
The application of materials informatics for the rational design of materials has been inspired by the increasing number of examples of success of machine learning in many fields, and it has been facilitated by the greater access to computational resources, the advances in algorithms and the growing open-source code community. This thesis presents two ways in which we have advanced the field of computational materials science through materials informatics. A promising application of materials informatics to materials science is the development of machine-learned interatomic potentials models that are orders of magnitude faster than ab initio methods such as density functional theory and can be nearly as accurate. However, these models are typically orders of magnitude slower than physics-derived models such as the embedded atom method (EAM), and they usually do not generalize well. We present a supervised machine learning approach for developing interatomic potential models to simulate atomic systems at large time and length scales from ab initio data. The models developed with our symbolic regression algorithm are computationally fast, simple (and interpretable), accurate, and transferrable. A reason for the success of our algorithm is that it learns models using a physics-informed hypothesis space. Another important component of our algorithm is the minimization of a multi-objective cost function to search simple, accurate and fast interatomic potential models. We first demonstrate our approach for elemental Cu, and then show how the models discovered for Cu transfer well to other fcc transition metals close to Cu on the periodic table. Then, we demonstrate how our algorithm can be used to discover new functional forms for the fcc transition metals close to Cu on the periodic table, benefiting from the information encoded in known models as a seed to the search. The machine learning interatomic potential models developed with our approach are 2-3 orders of magnitude faster than other machine learned potentials, they are on average one order of magnitude simpler than EAM-type models, and their transferability is at least as good as that of other EAM-type models. In addition, their simplicity opens the door for studying their functional forms to possibly gain insights into the atomic systems. This thesis also addresses the need for a database of atomically precise nanoclusters at the density functional theory level of accuracy. Our approach used a genetic algorithm to identify low-energy clusters, and to our knowledge, it constitutes the largest database of atomically precise nanoclusters at the level of accuracy of density functional theory. This database can inform studies that aim to design clusters for a variety of applications, it can be used to train machine learning models, or it can be used as a benchmark for other studies
Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example
Machine learning (ML) is widely used to explore crystal materials and predict
their properties. However, the training is time-consuming for deep-learning
models, and the regression process is a black box that is hard to interpret.
Also, the preprocess to transfer a crystal structure into the input of ML,
called descriptor, needs to be designed carefully. To efficiently predict
important properties of materials, we propose an approach based on ensemble
learning consisting of regression trees to predict formation energy and elastic
constants based on small-size datasets of carbon allotropes as an example.
Without using any descriptor, the inputs are the properties calculated by
molecular dynamics with 9 different classical interatomic potentials. Overall,
the results from ensemble learning are more accurate than those from classical
interatomic potentials, and ensemble learning can capture the relatively
accurate properties from the 9 classical potentials as criteria for predicting
the final properties
Efficient Gaussian Process Regression for prediction of molecular crystals harmonic free energies
We present a method to accurately predict the Helmholtz harmonic free
energies of molecular crystals in high-throughput settings. This is achieved by
devising a computationally efficient framework that employs a Gaussian Process
Regression model based on local atomic environments. The cost to train the
model with ab initio potentials is reduced by starting the optimisation of the
framework parameters, as well as the training and validation sets, with an
empirical potential. This is then transferred to train the model based on
density-functional theory potentials, including dispersion-corrections. We
benchmarked our framework on a set of 444 hydrocarbon crystal structures,
comprising 38 polymorphs, and 406 crystal structures either measured in
different conditions or derived from them. Superior performance and high
prediction accuracy, with mean absolute deviation below 0.04 kJ/mol/atom at 300
K is achieved by training on as little as 60 crystal structures. Furthermore,
we demonstrate the predictive efficiency and accuracy of the developed
framework by successfully calculating the thermal lattice expansion of aromatic
hydrocarbon crystals within the quasi-harmonic approximation, and predict how
lattice expansion affects the polymorph stability ranking.Comment: 12 pages, 5 figure
Computational characterization and prediction of metal-organic framework properties
In this introductory review, we give an overview of the computational
chemistry methods commonly used in the field of metal-organic frameworks
(MOFs), to describe or predict the structures themselves and characterize their
various properties, either at the quantum chemical level or through classical
molecular simulation. We discuss the methods for the prediction of crystal
structures, geometrical properties and large-scale screening of hypothetical
MOFs, as well as their thermal and mechanical properties. A separate section
deals with the simulation of adsorption of fluids and fluid mixtures in MOFs
- …