Synthetic Data for Model Selection

Bhonker, Nadav; Fintz, Matan; Kviatkovsky, Igor; Medioni, Gerard; Shoshan, Alon

Synthetic Data for Model Selection

Authors: Nadav Bhonker
Matan Fintz
Igor Kviatkovsky
Gerard Medioni
Alon Shoshan
Publication date: 5 July 2023
Publisher

Abstract

Recent breakthroughs in synthetic data generation approaches made it possible to produce highly photorealistic images which are hardly distinguishable from real ones. Furthermore, synthetic generation pipelines have the potential to generate an unlimited number of images. The combination of high photorealism and scale turn synthetic data into a promising candidate for improving various machine learning (ML) pipelines. Thus far, a large body of research in this field has focused on using synthetic images for training, by augmenting and enlarging training data. In contrast to using synthetic data for training, in this work we explore whether synthetic data can be beneficial for model selection. Considering the task of image classification, we demonstrate that when data is scarce, synthetic data can be used to replace the held out validation set, thus allowing to train on a larger dataset. We also introduce a novel method to calibrate the synthetic error estimation to fit that of the real domain. We show that such calibration significantly improves the usefulness of synthetic data for model selection

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2105.00717

Last time updated on 08/07/2023