We introduce LAESI, a Synthetic Leaf Dataset of 100,000 synthetic leaf images
on millimeter paper, each with semantic masks and surface area labels. This
dataset provides a resource for leaf morphology analysis primarily aimed at
beech and oak leaves. We evaluate the applicability of the dataset by training
machine learning models for leaf surface area prediction and semantic
segmentation, using real images for validation. Our validation shows that these
models can be trained to predict leaf surface area with a relative error not
greater than an average human annotator. LAESI also provides an efficient
framework based on 3D procedural models and generative AI for the large-scale,
controllable generation of data with potential further applications in
agriculture and biology. We evaluate the inclusion of generative AI in our
procedural data generation pipeline and show how data filtering based on
annotation consistency results in datasets which allow training the highest
performing vision models.Comment: 10 pages, 12 figures, 1 tabl