The space of all plausible materials for a given application is so large that it cannot be explored using a brute-force approach.
This is, in particular, the case for reticular chemistry which provides materials designers with a practically infinite playground on different length scales.
One promising approach to guide the design and discovery of materials is machine learning, which typically involves learning a mapping of structures onto properties from data.
While there have been plenty of examples of the use of machine learning for reticular materials, the progress in the field seems to have stagnated.
From our perspective, an important reason is that digital reticular chemistry is still more an art than a science in which many parts are only accessible to experienced groups. The lack of standardization across all the steps of the machine learning pipeline makes it practically impossible to directly compare machine learning models and build on top of prior results.
To confront these challenges, we present mofdscribe: a software ecosystem that accompanies—seasoned as well as novice—digital reticular chemists on all steps from ideation to model publication.
Our package provides reference datasets (including a completely new one), more than 35 reported as well as completely novel featurization strategies, data splitters, and validation helpers which can be used to benchmark new modeling strategies on standard benchmark tasks and to report the results on a public leaderboard.
We envision that this ecosystem allows for a more robust, comparable, and productive area of digital reticular chemistry