94 research outputs found
Atomistic Line Graph Neural Network for Improved Materials Property Predictions
Graph neural networks (GNN) have been shown to provide substantial
performance improvements for representing and modeling atomistic materials
compared with descriptor-based machine-learning models. While most existing GNN
models for atomistic predictions are based on atomic distance information, they
do not explicitly incorporate bond angles, which are critical for
distinguishing many atomic structures. Furthermore, many material properties
are known to be sensitive to slight changes in bond angles. We present an
Atomistic Line Graph Neural Network (ALIGNN), a GNN architecture that performs
message passing on both the interatomic bond graph and its line graph
corresponding to bond angles. We demonstrate that angle information can be
explicitly and efficiently included, leading to improved performance on
multiple atomistic prediction tasks. We use ALIGNN models for predicting 52
solid-state and molecular properties available in the JARVIS-DFT, Materials
project, and QM9 databases. ALIGNN can outperform some previously reported GNN
models on atomistic prediction tasks by up to 85 % in accuracy with better or
comparable model training speed
On the redundancy in large material datasets: efficient and robust learning with less data
Extensive efforts to gather materials data have largely overlooked potential
data redundancy. In this study, we present evidence of a significant degree of
redundancy across multiple large datasets for various material properties, by
revealing that up to 95 % of data can be safely removed from machine learning
training with little impact on in-distribution prediction performance. The
redundant data is related to over-represented material types and does not
mitigate the severe performance degradation on out-of-distribution samples. In
addition, we show that uncertainty-based active learning algorithms can
construct much smaller but equally informative datasets. We discuss the
effectiveness of informative data in improving prediction performance and
robustness and provide insights into efficient data acquisition and machine
learning training. This work challenges the "bigger is better" mentality and
calls for attention to the information richness of materials data rather than a
narrow emphasis on data volume.Comment: Main text: 10 pages, 2 tables, 5 figures. Supplemental information:
29 pages, 1 table, 23 figure
Recent progress in the JARVIS infrastructure for next-generation data-driven materials design
The Joint Automated Repository for Various Integrated Simulations (JARVIS)
infrastructure at the National Institute of Standards and Technology (NIST) is
a large-scale collection of curated datasets and tools with more than 80000
materials and millions of properties. JARVIS uses a combination of electronic
structure, artificial intelligence (AI), advanced computation and experimental
methods to accelerate materials design. Here we report some of the new features
that were recently included in the infrastructure such as: 1) doubling the
number of materials in the database since its first release, 2) including more
accurate electronic structure methods such as Quantum Monte Carlo, 3) including
graph neural network-based materials design, 4) development of unified
force-field, 5) development of a universal tight-binding model, 6) addition of
computer-vision tools for advanced microscopy applications, 7) development of a
natural language processing tool for text-generation and analysis, 8) debuting
a large-scale benchmarking endeavor, 9) including quantum computing algorithms
for solids, 10) integrating several experimental datasets and 11) staging
several community engagement and outreach events. New classes of materials,
properties, and workflows added to the database include superconductors,
two-dimensional (2D) magnets, magnetic topological materials, metal-organic
frameworks, defects, and interface systems. The rich and reliable datasets,
tools, documentation, and tutorials make JARVIS a unique platform for modern
materials design. JARVIS ensures openness of data and tools to enhance
reproducibility and transparency and to promote a healthy and collaborative
scientific environment
Accelerating Defect Predictions in Semiconductors Using Graph Neural Networks
Here, we develop a framework for the prediction and screening of native
defects and functional impurities in a chemical space of Group IV, III-V, and
II-VI zinc blende (ZB) semiconductors, powered by crystal Graph-based Neural
Networks (GNNs) trained on high-throughput density functional theory (DFT)
data. Using an innovative approach of sampling partially optimized defect
configurations from DFT calculations, we generate one of the largest
computational defect datasets to date, containing many types of vacancies,
self-interstitials, anti-site substitutions, impurity interstitials and
substitutions, as well as some defect complexes. We applied three types of
established GNN techniques, namely Crystal Graph Convolutional Neural Network
(CGCNN), Materials Graph Network (MEGNET), and Atomistic Line Graph Neural
Network (ALIGNN), to rigorously train models for predicting defect formation
energy (DFE) in multiple charge states and chemical potential conditions. We
find that ALIGNN yields the best DFE predictions with root mean square errors
around 0.3 eV, which represents a prediction accuracy of 98 % given the range
of values within the dataset, improving significantly on the state-of-the-art.
Models are tested for different defect types as well as for defect charge
transition levels. We further show that GNN-based defective structure
optimization can take us close to DFT-optimized geometries at a fraction of the
cost of full DFT. DFT-GNN models enable prediction and screening across
thousands of hypothetical defects based on both unoptimized and
partially-optimized defective structures, helping identify electronically
active defects in technologically-important semiconductors
Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks
X-ray diffraction (XRD) data acquisition and analysis is among the most
time-consuming steps in the development cycle of novel thin-film materials. We
propose a machine-learning-enabled approach to predict crystallographic
dimensionality and space group from a limited number of thin-film XRD patterns.
We overcome the scarce-data problem intrinsic to novel materials development by
coupling a supervised machine learning approach with a model agnostic,
physics-informed data augmentation strategy using simulated data from the
Inorganic Crystal Structure Database (ICSD) and experimental data. As a test
case, 115 thin-film metal halides spanning 3 dimensionalities and 7
space-groups are synthesized and classified. After testing various algorithms,
we develop and implement an all convolutional neural network, with cross
validated accuracies for dimensionality and space-group classification of 93%
and 89%, respectively. We propose average class activation maps, computed from
a global average pooling layer, to allow high model interpretability by human
experimentalists, elucidating the root causes of misclassification. Finally, we
systematically evaluate the maximum XRD pattern step size (data acquisition
rate) before loss of predictive accuracy occurs, and determine it to be
0.16{\deg}, which enables an XRD pattern to be obtained and classified in 5.5
minutes or less.Comment: Accepted with minor revisions in npj Computational Materials,
Presented in NIPS 2018 Workshop: Machine Learning for Molecules and Material
- …