Crystal Structure Generation with Autoregressive Large Language Modeling

Abstract

The generation of plausible crystal structures is often an important step in the computational prediction of crystal structures from composition. Here, we introduce a methodology for crystal structure generation involving autoregressive large language modeling of the Crystallographic Information File (CIF) format. Our model, CrystaLLM, is trained on a comprehensive dataset of millions of CIF files, and is capable of reliably generating correct CIF syntax and plausible crystal structures for many classes of inorganic compounds. Moreover, we provide general and open access to the model by deploying it as a web application, available to anyone over the internet. Our results indicate that the model promises to be a reliable and efficient tool for both crystallography and materials informatics

    Similar works

    Full text

    thumbnail-image

    Available Versions