2 research outputs found
Robust and On-the-fly Dataset Denoising for Image Classification
Memorization in over-parameterized neural networks could severely hurt
generalization in the presence of mislabeled examples. However, mislabeled
examples are hard to avoid in extremely large datasets collected with weak
supervision. We address this problem by reasoning counterfactually about the
loss distribution of examples with uniform random labels had they were trained
with the real examples, and use this information to remove noisy examples from
the training set. First, we observe that examples with uniform random labels
have higher losses when trained with stochastic gradient descent under large
learning rates. Then, we propose to model the loss distribution of the
counterfactual examples using only the network parameters, which is able to
model such examples with remarkable success. Finally, we propose to remove
examples whose loss exceeds a certain quantile of the modeled loss
distribution. This leads to On-the-fly Data Denoising (ODD), a simple yet
effective algorithm that is robust to mislabeled examples, while introducing
almost zero computational overhead compared to standard training. ODD is able
to achieve state-of-the-art results on a wide range of datasets including
real-world ones such as WebVision and Clothing1M
Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization
Real-world large-scale datasets are heteroskedastic and imbalanced -- labels
have varying levels of uncertainty and label distributions are long-tailed.
Heteroskedasticity and imbalance challenge deep learning algorithms due to the
difficulty of distinguishing among mislabeled, ambiguous, and rare examples.
Addressing heteroskedasticity and imbalance simultaneously is under-explored.
We propose a data-dependent regularization technique for heteroskedastic
datasets that regularizes different regions of the input space differently.
Inspired by the theoretical derivation of the optimal regularization strength
in a one-dimensional nonparametric classification setting, our approach
adaptively regularizes the data points in higher-uncertainty, lower-density
regions more heavily. We test our method on several benchmark tasks, including
a real-world heteroskedastic and imbalanced dataset, WebVision. Our experiments
corroborate our theory and demonstrate a significant improvement over other
methods in noise-robust deep learning.Comment: to appear in ICLR 202