Applying the representational power of machine learning to the prediction of
complex fluid dynamics has been a relevant subject of study for years. However,
the amount of available fluid simulation data does not match the notoriously
high requirements of machine learning methods. Researchers have typically
addressed this issue by generating their own datasets, preventing a consistent
evaluation of their proposed approaches. Our work introduces a generation
procedure for synthetic multi-modal fluid simulations datasets. By leveraging a
GPU implementation, our procedure is also efficient enough that no data needs
to be exchanged between users, except for configuration files required to
reproduce the dataset. Furthermore, our procedure allows multiple modalities
(generating both geometry and photorealistic renderings) and is general enough
for it to be applied to various tasks in data-driven fluid simulation. We then
employ our framework to generate a set of thoughtfully designed benchmark
datasets, which attempt to span specific fluid simulation scenarios in a
meaningful way. The properties of our contributions are demonstrated by
evaluating recently published algorithms for the neural fluid simulation and
fluid inverse rendering tasks using our benchmark datasets. Our contribution
aims to fulfill the community's need for standardized benchmarks, fostering
research that is more reproducible and robust than previous endeavors.Comment: 10 pages, 7 figure