Mislabeled, duplicated, or biased data in real-world scenarios can lead to
prolonged training and even hinder model convergence. Traditional solutions
prioritizing easy or hard samples lack the flexibility to handle such a variety
simultaneously. Recent work has proposed a more reasonable data selection
principle by examining the data's impact on the model's generalization loss.
However, its practical adoption relies on less principled approximations and
additional clean holdout data. This work solves these problems by leveraging a
lightweight Bayesian treatment and incorporating off-the-shelf zero-shot
predictors built on large-scale pre-trained models. The resulting algorithm is
efficient and easy-to-implement. We perform extensive empirical studies on
challenging benchmarks with considerable data noise and imbalance in the online
batch selection scenario, and observe superior training efficiency over
competitive baselines. Notably, on the challenging WebVision benchmark, our
method can achieve similar predictive performance with significantly fewer
training iterations than leading data selection methods