The diverse and high-quality content generated by recent generative models
demonstrates the great potential of using synthetic data to train downstream
models. However, in vision, especially in objection detection, related areas
are not fully explored, the synthetic images are merely used to balance the
long tails of existing datasets, and the accuracy of the generated labels is
low, the full potential of generative models has not been exploited. In this
paper, we propose DODA, a data synthesizer that can generate high-quality
object detection data for new domains in agriculture. Specifically, we improve
the controllability of layout-to-image through encoding layout as an image,
thereby improving the quality of labels, and use a visual encoder to provide
visual clues for the diffusion model to decouple visual features from the
diffusion model, and empowering the model the ability to generate data in new
domains. On the Global Wheat Head Detection (GWHD) Dataset, which is the
largest dataset in agriculture and contains diverse domains, using the data
synthesized by DODA improves the performance of the object detector by
12.74-17.76 AP50​ in the domain that was significantly shifted from the
training data