In this paper, we propose a novel data-pruning approach called
moving-one-sample-out (MoSo), which aims to identify and remove the least
informative samples from the training set. The core insight behind MoSo is to
determine the importance of each sample by assessing its impact on the optimal
empirical risk. This is achieved by measuring the extent to which the empirical
risk changes when a particular sample is excluded from the training set.
Instead of using the computationally expensive leaving-one-out-retraining
procedure, we propose an efficient first-order approximator that only requires
gradient information from different training stages. The key idea behind our
approximation is that samples with gradients that are consistently aligned with
the average gradient of the training set are more informative and should
receive higher scores, which could be intuitively understood as follows: if the
gradient from a specific sample is consistent with the average gradient vector,
it implies that optimizing the network using the sample will yield a similar
effect on all remaining samples. Experimental results demonstrate that MoSo
effectively mitigates severe performance degradation at high pruning ratios and
achieves satisfactory performance across various settings.Comment: Accepted by the Thirty-seventh Conference on Neural Information
Processing Systems (NeurIPS 2023