Recent advancements in deep learning have been primarily driven by the use of
large models trained on increasingly vast datasets. While neural scaling laws
have emerged to predict network performance given a specific level of
computational resources, the growing demand for expansive datasets raises
concerns. To address this, a new research direction has emerged, focusing on
the creation of synthetic data as a substitute. In this study, we investigate
how neural networks exhibit shape bias during training on synthetic datasets,
serving as an indicator of the synthetic data quality. Specifically, our
findings indicate three key points: (1) Shape bias varies across network
architectures and types of supervision, casting doubt on its reliability as a
predictor for generalization and its ability to explain differences in model
recognition compared to human capabilities. (2) Relying solely on shape bias to
estimate generalization is unreliable, as it is entangled with diversity and
naturalism. (3) We propose a novel interpretation of shape bias as a tool for
estimating the diversity of samples within a dataset. Our research aims to
clarify the implications of using synthetic data and its associated shape bias
in deep learning, addressing concerns regarding generalization and dataset
quality