Search CORE

94 research outputs found

Atomistic Line Graph Neural Network for Improved Materials Property Predictions

Author: Choudhary Kamal
DeCost Brian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2021
Field of study

Graph neural networks (GNN) have been shown to provide substantial performance improvements for representing and modeling atomistic materials compared with descriptor-based machine-learning models. While most existing GNN models for atomistic predictions are based on atomic distance information, they do not explicitly incorporate bond angles, which are critical for distinguishing many atomic structures. Furthermore, many material properties are known to be sensitive to slight changes in bond angles. We present an Atomistic Line Graph Neural Network (ALIGNN), a GNN architecture that performs message passing on both the interatomic bond graph and its line graph corresponding to bond angles. We demonstrate that angle information can be explicitly and efficiently included, leading to improved performance on multiple atomistic prediction tasks. We use ALIGNN models for predicting 52 solid-state and molecular properties available in the JARVIS-DFT, Materials project, and QM9 databases. ALIGNN can outperform some previously reported GNN models on atomistic prediction tasks by up to 85 % in accuracy with better or comparable model training speed

arXiv.org e-Print Archive

Directory of Open Access Journals

On the redundancy in large material datasets: efficient and robust learning with less data

Author: Choudhary Kamal
DeCost Brian
Greenwood Michael
Hattrick-Simpers Jason
Li Kangming
Persaud Daniel
Publication venue
Publication date: 25/04/2023
Field of study

Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95 % of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the "bigger is better" mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume.Comment: Main text: 10 pages, 2 tables, 5 figures. Supplemental information: 29 pages, 1 table, 23 figure

arXiv.org e-Print Archive

Recent progress in the JARVIS infrastructure for next-generation data-driven materials design

Author: Biacchi Adam J.
Choudhary Kamal
DeCost Brian
Garrity Kevin F.
Gurunathan Ramya
Tavazza Francesca
Wines Daniel
Publication venue
Publication date: 19/05/2023
Field of study

The Joint Automated Repository for Various Integrated Simulations (JARVIS) infrastructure at the National Institute of Standards and Technology (NIST) is a large-scale collection of curated datasets and tools with more than 80000 materials and millions of properties. JARVIS uses a combination of electronic structure, artificial intelligence (AI), advanced computation and experimental methods to accelerate materials design. Here we report some of the new features that were recently included in the infrastructure such as: 1) doubling the number of materials in the database since its first release, 2) including more accurate electronic structure methods such as Quantum Monte Carlo, 3) including graph neural network-based materials design, 4) development of unified force-field, 5) development of a universal tight-binding model, 6) addition of computer-vision tools for advanced microscopy applications, 7) development of a natural language processing tool for text-generation and analysis, 8) debuting a large-scale benchmarking endeavor, 9) including quantum computing algorithms for solids, 10) integrating several experimental datasets and 11) staging several community engagement and outreach events. New classes of materials, properties, and workflows added to the database include superconductors, two-dimensional (2D) magnets, magnetic topological materials, metal-organic frameworks, defects, and interface systems. The rich and reliable datasets, tools, documentation, and tutorials make JARVIS a unique platform for modern materials design. JARVIS ensures openness of data and tools to enhance reproducibility and transparency and to promote a healthy and collaborative scientific environment

arXiv.org e-Print Archive

Accelerating Defect Predictions in Semiconductors Using Graph Neural Networks

Author: Choudhary Kamal
DeCost Brian
Gollapalli Prince
Manganaris Panayotis
Mannodi-Kanakkithodi Arun
Pilania Ghanshyam
Rahman Md Habibur
Yadav Satyesh Kumar
Publication venue
Publication date: 13/09/2023
Field of study

Here, we develop a framework for the prediction and screening of native defects and functional impurities in a chemical space of Group IV, III-V, and II-VI zinc blende (ZB) semiconductors, powered by crystal Graph-based Neural Networks (GNNs) trained on high-throughput density functional theory (DFT) data. Using an innovative approach of sampling partially optimized defect configurations from DFT calculations, we generate one of the largest computational defect datasets to date, containing many types of vacancies, self-interstitials, anti-site substitutions, impurity interstitials and substitutions, as well as some defect complexes. We applied three types of established GNN techniques, namely Crystal Graph Convolutional Neural Network (CGCNN), Materials Graph Network (MEGNET), and Atomistic Line Graph Neural Network (ALIGNN), to rigorously train models for predicting defect formation energy (DFE) in multiple charge states and chemical potential conditions. We find that ALIGNN yields the best DFE predictions with root mean square errors around 0.3 eV, which represents a prediction accuracy of 98 % given the range of values within the dataset, improving significantly on the state-of-the-art. Models are tested for different defect types as well as for defect charge transition levels. We further show that GNN-based defective structure optimization can take us close to DFT-optimized geometries at a fraction of the cost of full DFT. DFT-GNN models enable prediction and screening across thousands of hypothetical defects based on both unoptimized and partially-optimized defective structures, helping identify electronically active defects in technologically-important semiconductors

arXiv.org e-Print Archive

Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

Author: Buonassisi Tonio
DeCost Brian L.
Hartono Noor Titan Putri
Kusne Aaron Gilad
Liu Zhe
Oviedo Felipe
Ren Zekun
Romano Giuseppe
Savitha Ramasamy
Settens Charlie
Sun Shijing
Tian Siyu I. P.
Publication venue
Publication date: 23/04/2019
Field of study

X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal halides spanning 3 dimensionalities and 7 space-groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross validated accuracies for dimensionality and space-group classification of 93% and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16{\deg}, which enables an XRD pattern to be obtained and classified in 5.5 minutes or less.Comment: Accepted with minor revisions in npj Computational Materials, Presented in NIPS 2018 Workshop: Machine Learning for Molecules and Material

arXiv.org e-Print Archive

DSpace@MIT