Dataset Distillation (DD), a newly emerging field, aims at generating much
smaller and high-quality synthetic datasets from large ones. Existing DD
methods based on gradient matching achieve leading performance; however, they
are extremely computationally intensive as they require continuously optimizing
a dataset among thousands of randomly initialized models. In this paper, we
assume that training the synthetic data with diverse models leads to better
generalization performance. Thus we propose two \textbf{model augmentation}
techniques, ~\ie using \textbf{early-stage models} and \textbf{weight
perturbation} to learn an informative synthetic set with significantly reduced
training cost. Extensive experiments demonstrate that our method achieves up to
20× speedup and comparable performance on par with state-of-the-art
baseline methods