Recent image generation models such as Stable Diffusion have exhibited an
impressive ability to generate fairly realistic images starting from a simple
text prompt. Could such models render real images obsolete for training image
prediction models? In this paper, we answer part of this provocative question
by investigating the need for real images when training models for ImageNet
classification. Provided only with the class names that have been used to build
the dataset, we explore the ability of Stable Diffusion to generate synthetic
clones of ImageNet and measure how useful these are for training classification
models from scratch. We show that with minimal and class-agnostic prompt
engineering, ImageNet clones are able to close a large part of the gap between
models produced by synthetic images and models trained with real images, for
the several standard classification benchmarks that we consider in this study.
More importantly, we show that models trained on synthetic images exhibit
strong generalization properties and perform on par with models trained on real
data for transfer. Project page: https://europe.naverlabs.com/imagenet-sd/Comment: Accepted to CVPR 202