We present the Open MatSci ML Toolkit: a flexible, self-contained, and
scalable Python-based framework to apply deep learning models and methods on
scientific data with a specific focus on materials science and the OpenCatalyst
Dataset. Our toolkit provides: 1. A scalable machine learning workflow for
materials science leveraging PyTorch Lightning, which enables seamless scaling
across different computation capabilities (laptop, server, cluster) and
hardware platforms (CPU, GPU, XPU). 2. Deep Graph Library (DGL) support for
rapid graph neural network prototyping and development. By publishing and
sharing this toolkit with the research community via open-source release, we
hope to: 1. Lower the entry barrier for new machine learning researchers and
practitioners that want to get started with the OpenCatalyst dataset, which
presently comprises the largest computational materials science dataset. 2.
Enable the scientific community to apply advanced machine learning tools to
high-impact scientific challenges, such as modeling of materials behavior for
clean energy applications. We demonstrate the capabilities of our framework by
enabling three new equivariant neural network models for multiple OpenCatalyst
tasks and arrive at promising results for compute scaling and model
performance.Comment: Paper accompanying Open-Source Software from
https://github.com/IntelLabs/matscim