1 research outputs found
ColabFit Exchange: open-access datasets for data-driven interatomic potentials
Data-driven (DD) interatomic potentials (IPs) trained on large collections of
first principles calculations are rapidly becoming essential tools in the
fields of computational materials science and chemistry for performing
atomic-scale simulations. Despite this, apart from a few notable exceptions,
there is a distinct lack of well-organized, public datasets in common formats
available for use with IP development. This deficiency precludes the research
community from implementing widespread benchmarking, which is essential for
gaining insight into model performance and transferability, while also limiting
the development of more general, or even universal, IPs. To address this issue,
we introduce the ColabFit Exchange, the first database providing open access to
a large collection of systematically organized datasets from multiple domains
that is especially designed for IP development. The ColabFit Exchange is
publicly available at \url{https://colabfit.org/}, providing a web-based
interface for exploring, downloading, and contributing datasets. Composed of
data collected from the literature or provided by community researchers, the
ColabFit Exchange consists of 106 datasets spanning nearly 70,000 unique
chemistries, and is intended to continuously grow. In addition to outlining the
software framework used for constructing and accessing the ColabFit Exchange,
we also provide analyses of data, quantifying the diversity and proposing
metrics for assessing the relative quality and atomic environment coverage of
different datasets. Finally, we demonstrate an end-to-end IP development
pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the
KLIFF software package, and validation tests provided by the OpenKIM framework