7 research outputs found
Chemical Properties from Graph Neural Network-Predicted Electron Densities
According to density functional theory, any chemical property can be inferred
from the electron density, making it the most informative attribute of an
atomic structure. In this work, we demonstrate the use of established physical
methods to obtain important chemical properties from model-predicted electron
densities. We introduce graph neural network architectural choices that provide
physically relevant and useful electron density predictions. Despite not
training to predict atomic charges, the model is able to predict atomic charges
with an order of magnitude lower error than a sum of atomic charge densities.
Similarly, the model predicts dipole moments with half the error of the sum of
atomic charge densities method. We demonstrate that larger data sets lead to
more useful predictions in these tasks. These results pave the way for an
alternative path in atomistic machine learning, where data-driven approaches
and existing physical methods are used in tandem to obtain a variety of
chemical properties in an explainable and self-consistent manner
GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets
Recent years have seen the advent of molecular simulation datasets that are
orders of magnitude larger and more diverse. These new datasets differ
substantially in four aspects of complexity: 1. Chemical diversity (number of
different elements), 2. system size (number of atoms per sample), 3. dataset
size (number of data samples), and 4. domain shift (similarity of the training
and test set). Despite these large differences, benchmarks on small and narrow
datasets remain the predominant method of demonstrating progress in graph
neural networks (GNNs) for molecular simulation, likely due to cheaper training
compute requirements. This raises the question -- does GNN progress on small
and narrow datasets translate to these more complex datasets? This work
investigates this question by first developing the GemNet-OC model based on the
large Open Catalyst 2020 (OC20) dataset. GemNet-OC outperforms the previous
state-of-the-art on OC20 by 16% while reducing training time by a factor of 10.
We then compare the impact of 18 model components and hyperparameter choices on
performance in multiple datasets. We find that the resulting model would be
drastically different depending on the dataset used for making model choices.
To isolate the source of this discrepancy we study six subsets of the OC20
dataset that individually test each of the above-mentioned four dataset
aspects. We find that results on the OC-2M subset correlate well with the full
OC20 dataset while being substantially cheaper to train on. Our findings
challenge the common practice of developing GNNs solely on small datasets, but
highlight ways of achieving fast development cycles and generalizable results
via moderately-sized, representative datasets such as OC-2M and efficient
models such as GemNet-OC. Our code and pretrained model weights are
open-sourced
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials
Computational catalysis is playing an increasingly significant role in the
design of catalysts across a wide range of applications. A common task for many
computational methods is the need to accurately compute the adsorption energy
for an adsorbate and a catalyst surface of interest. Traditionally, the
identification of low energy adsorbate-surface configurations relies on
heuristic methods and researcher intuition. As the desire to perform
high-throughput screening increases, it becomes challenging to use heuristics
and intuition alone. In this paper, we demonstrate machine learning potentials
can be leveraged to identify low energy adsorbate-surface configurations more
accurately and efficiently. Our algorithm provides a spectrum of trade-offs
between accuracy and efficiency, with one balanced option finding the lowest
energy configuration 87.36% of the time, while achieving a 2000x speedup in
computation. To standardize benchmarking, we introduce the Open Catalyst Dense
dataset containing nearly 1,000 diverse surfaces and 100,000 unique
configurations.Comment: 26 pages, 7 figures. Submitted to npj Computational Material
Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery
The development of machine learned potentials for catalyst discovery has
predominantly been focused on very specific chemistries and material
compositions. While effective in interpolating between available materials,
these approaches struggle to generalize across chemical space. The recent
curation of large-scale catalyst datasets has offered the opportunity to build
a universal machine learning potential, spanning chemical and composition
space. If accomplished, said potential could accelerate the catalyst discovery
process across a variety of applications (CO2 reduction, NH3 production, etc.)
without additional specialized training efforts that are currently required.
The release of the Open Catalyst 2020 (OC20) has begun just that, pushing the
heterogeneous catalysis and machine learning communities towards building more
accurate and robust models. In this perspective, we discuss some of the
challenges and findings of recent developments on OC20. We examine the
performance of current models across different materials and adsorbates to
identify notably underperforming subsets. We then discuss some of the modeling
efforts surrounding energy-conservation, approaches to finding and evaluating
the local minima, and augmentation of off-equilibrium data. To complement the
community's ongoing developments, we end with an outlook to some of the
important challenges that have yet to be thoroughly explored for large-scale
catalyst discovery.Comment: submitted to ACS Catalysi
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts
The development of machine learning models for electrocatalysts requires a
broad set of training data to enable their use across a wide variety of
materials. One class of materials that currently lacks sufficient training data
is oxides, which are critical for the development of Oxygen Evolution Reaction
(OER) catalysts. To address this, we developed the Open Catalyst 2022 (OC22)
dataset, consisting of 62,331 Density Functional Theory (DFT) relaxations
(~9,854,504 single point calculations) across a range of oxide materials,
coverages, and adsorbates. We define generalized total energy tasks that enable
property prediction beyond adsorption energies; we test baseline performance of
several graph neural networks; and we provide pre-defined dataset splits to
establish clear benchmarks for future efforts. In the most general task,
GemNet-OC sees a ~32% improvement in energy predictions when combining the
chemically dissimilar Open Catalyst 2020 Dataset (OC20) and OC22 datasets via
fine-tuning. Similarly, we achieved a ~19% improvement in total energy
predictions on OC20 and a ~9% improvement in force predictions in OC22 when
using joint training. We demonstrate the practical utility of a top performing
model by capturing literature adsorption energies and important OER scaling
relationships. We expect OC22 to provide an important benchmark for models
seeking to incorporate intricate long-range electrostatic and magnetic
interactions in oxide surfaces. The dataset and baseline models are open
sourced, and a public leaderboard has been made available to encourage
continued community developments on the total energy tasks and data.Comment: 48 pages, 14 figure