3,752 research outputs found
Robust Training under Label Noise by Over-parameterization
Recently, over-parameterized deep networks, with increasingly more network
parameters than training samples, have dominated the performances of modern
machine learning. However, when the training data is corrupted, it has been
well-known that over-parameterized networks tend to overfit and do not
generalize. In this work, we propose a principled approach for robust training
of over-parameterized deep networks in classification tasks where a proportion
of training labels are corrupted. The main idea is yet very simple: label noise
is sparse and incoherent with the network learned from clean data, so we model
the noise and learn to separate it from the data. Specifically, we model the
label noise via another sparse over-parameterization term, and exploit implicit
algorithmic regularizations to recover and separate the underlying corruptions.
Remarkably, when trained using such a simple method in practice, we demonstrate
state-of-the-art test accuracy against label noise on a variety of real
datasets. Furthermore, our experimental results are corroborated by theory on
simplified linear models, showing that exact separation between sparse noise
and low-rank data can be achieved under incoherent conditions. The work opens
many interesting directions for improving over-parameterized models by using
sparse over-parameterization and implicit regularization.Comment: 25 pages, 4 figures and 6 tables. Code is available at
https://github.com/shengliu66/SO
Are All Losses Created Equal: A Neural Collapse Perspective
While cross entropy (CE) is the most commonly used loss to train deep neural
networks for classification tasks, many alternative losses have been developed
to obtain better empirical performance. Among them, which one is the best to
use is still a mystery, because there seem to be multiple factors affecting the
answer, such as properties of the dataset, the choice of network architecture,
and so on. This paper studies the choice of loss function by examining the
last-layer features of deep networks, drawing inspiration from a recent line
work showing that the global optimal solution of CE and mean-square-error (MSE)
losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large
networks trained until convergence, (i) all features of the same class collapse
to the corresponding class mean and (ii) the means associated with different
classes are in a configuration where their pairwise distances are all equal and
maximized. We extend such results and show through global solution and
landscape analyses that a broad family of loss functions including commonly
used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence,
all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on
training data. Based on the unconstrained feature model assumption, we provide
either the global landscape analysis for LS loss or the local landscape
analysis for FL loss and show that the (only!) global minimizers are neural
collapse solutions, while all other critical points are strict saddles whose
Hessian exhibit negative curvature directions either in the global scope for LS
loss or in the local scope for FL loss near the optimal solution. The
experiments further show that Neural Collapse features obtained from all
relevant losses lead to largely identical performance on test data as well,
provided that the network is sufficiently large and trained until convergence.Comment: 32 page, 10 figures, NeurIPS 202
Inquiry diagnosis of coronary heart disease in Chinese medicine based on symptom-syndrome interactions
- …