The generation of plausible crystal structures is often an important step in
the computational prediction of crystal structures from composition. Here, we
introduce a methodology for crystal structure generation involving
autoregressive large language modeling of the Crystallographic Information File
(CIF) format. Our model, CrystaLLM, is trained on a comprehensive dataset of
millions of CIF files, and is capable of reliably generating correct CIF syntax
and plausible crystal structures for many classes of inorganic compounds.
Moreover, we provide general and open access to the model by deploying it as a
web application, available to anyone over the internet. Our results indicate
that the model promises to be a reliable and efficient tool for both
crystallography and materials informatics