Recently, over-parameterized deep networks, with increasingly more network
parameters than training samples, have dominated the performances of modern
machine learning. However, when the training data is corrupted, it has been
well-known that over-parameterized networks tend to overfit and do not
generalize. In this work, we propose a principled approach for robust training
of over-parameterized deep networks in classification tasks where a proportion
of training labels are corrupted. The main idea is yet very simple: label noise
is sparse and incoherent with the network learned from clean data, so we model
the noise and learn to separate it from the data. Specifically, we model the
label noise via another sparse over-parameterization term, and exploit implicit
algorithmic regularizations to recover and separate the underlying corruptions.
Remarkably, when trained using such a simple method in practice, we demonstrate
state-of-the-art test accuracy against label noise on a variety of real
datasets. Furthermore, our experimental results are corroborated by theory on
simplified linear models, showing that exact separation between sparse noise
and low-rank data can be achieved under incoherent conditions. The work opens
many interesting directions for improving over-parameterized models by using
sparse over-parameterization and implicit regularization.Comment: 25 pages, 4 figures and 6 tables. Code is available at
https://github.com/shengliu66/SO