26 research outputs found
On the Over-Memorization During Natural, Robust and Catastrophic Overfitting
Overfitting negatively impacts the generalization ability of deep neural
networks (DNNs) in both natural and adversarial training. Existing methods
struggle to consistently address different types of overfitting, typically
designing strategies that focus separately on either natural or adversarial
patterns. In this work, we adopt a unified perspective by solely focusing on
natural patterns to explore different types of overfitting. Specifically, we
examine the memorization effect in DNNs and reveal a shared behaviour termed
over-memorization, which impairs their generalization capacity. This behaviour
manifests as DNNs suddenly becoming high-confidence in predicting certain
training patterns and retaining a persistent memory for them. Furthermore, when
DNNs over-memorize an adversarial pattern, they tend to simultaneously exhibit
high-confidence prediction for the corresponding natural pattern. These
findings motivate us to holistically mitigate different types of overfitting by
hindering the DNNs from over-memorization natural patterns. To this end, we
propose a general framework, Distraction Over-Memorization (DOM), which
explicitly prevents over-memorization by either removing or augmenting the
high-confidence natural patterns. Extensive experiments demonstrate the
effectiveness of our proposed method in mitigating overfitting across various
training paradigms
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant
unlabeled data with extremely scarce labeled data. However, most SSL methods
are commonly based on instance-wise consistency between different data
transformations. Therefore, the label guidance on labeled data is hard to be
propagated to unlabeled data. Consequently, the learning process on labeled
data is much faster than on unlabeled data which is likely to fall into a local
minima that does not favor unlabeled data, leading to sub-optimal
generalization performance. In this paper, we propose FlatMatch which minimizes
a cross-sharpness measure to ensure consistent learning performance between the
two datasets. Specifically, we increase the empirical risk on labeled data to
obtain a worst-case model which is a failure case that needs to be enhanced.
Then, by leveraging the richness of unlabeled data, we penalize the prediction
difference (i.e., cross-sharpness) between the worst-case model and the
original model so that the learning direction is beneficial to generalization
on unlabeled data. Therefore, we can calibrate the learning process without
being limited to insufficient label information. As a result, the mismatched
learning performance can be mitigated, further enabling the effective
exploitation of unlabeled data and improving SSL performance. Through
comprehensive validation, we show FlatMatch achieves state-of-the-art results
in many SSL settings.Comment: NeurIPS 202
Regularly Truncated M-estimators for Learning with Noisy Labels
The sample selection approach is very popular in learning with noisy labels.
As deep networks learn pattern first, prior methods built on sample selection
share a similar training procedure: the small-loss examples can be regarded as
clean examples and used for helping generalization, while the large-loss
examples are treated as mislabeled ones and excluded from network parameter
updates. However, such a procedure is arguably debatable from two folds: (a) it
does not consider the bad influence of noisy labels in selected small-loss
examples; (b) it does not make good use of the discarded large-loss examples,
which may be clean or have meaningful information for generalization. In this
paper, we propose regularly truncated M-estimators (RTME) to address the above
two issues simultaneously. Specifically, RTME can alternately switch modes
between truncated M-estimators and original M-estimators. The former can
adaptively select small-losses examples without knowing the noise rate and
reduce the side-effects of noisy labels in them. The latter makes the possibly
clean examples but with large losses involved to help generalization.
Theoretically, we demonstrate that our strategies are label-noise-tolerant.
Empirically, comprehensive experimental results show that our method can
outperform multiple baselines and is robust to broad noise types and levels.Comment: 16 pages, 11 tables, 9 figure
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization
Out-of-Distribution (OOD) Generalization aims to learn robust models that
generalize well to various environments without fitting to
distribution-specific features. Recent studies based on Lottery Ticket
Hypothesis (LTH) address this problem by minimizing the learning target to find
some of the parameters that are critical to the task. However, in OOD problems,
such solutions are suboptimal as the learning task contains severe distribution
noises, which can mislead the optimization process. Therefore, apart from
finding the task-related parameters (i.e., invariant parameters), we propose
Exploring Variant parameters for Invariant Learning (EVIL) which also leverages
the distribution knowledge to find the parameters that are sensitive to
distribution shift (i.e., variant parameters). Once the variant parameters are
left out of invariant learning, a robust subnetwork that is resistant to
distribution shift can be found. Additionally, the parameters that are
relatively stable across distributions can be considered invariant ones to
improve invariant learning. By fully exploring both variant and invariant
parameters, our EVIL can effectively identify a robust subnetwork to improve
OOD generalization. In extensive experiments on integrated testbed: DomainBed,
EVIL can effectively and efficiently enhance many popular methods, such as ERM,
IRM, SAM, etc.Comment: 27 pages, 9 figure
Strength-Adaptive Adversarial Training
Adversarial training (AT) is proved to reliably improve network's robustness
against adversarial data. However, current AT with a pre-specified perturbation
budget has limitations in learning a robust network. Firstly, applying a
pre-specified perturbation budget on networks of various model capacities will
yield divergent degree of robustness disparity between natural and robust
accuracies, which deviates from robust network's desideratum. Secondly, the
attack strength of adversarial training data constrained by the pre-specified
perturbation budget fails to upgrade as the growth of network robustness, which
leads to robust overfitting and further degrades the adversarial robustness. To
overcome these limitations, we propose \emph{Strength-Adaptive Adversarial
Training} (SAAT). Specifically, the adversary employs an adversarial loss
constraint to generate adversarial training data. Under this constraint, the
perturbation budget will be adaptively adjusted according to the training state
of adversarial data, which can effectively avoid robust overfitting. Besides,
SAAT explicitly constrains the attack strength of training data through the
adversarial loss, which manipulates model capacity scheduling during training,
and thereby can flexibly control the degree of robustness disparity and adjust
the tradeoff between natural accuracy and robustness. Extensive experiments
show that our proposal boosts the robustness of adversarial training
Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation
Out-of-distribution (OOD) detection is important for deploying reliable
machine learning models on real-world applications. Recent advances in outlier
exposure have shown promising results on OOD detection via fine-tuning model
with informatively sampled auxiliary outliers. However, previous methods assume
that the collected outliers can be sufficiently large and representative to
cover the boundary between ID and OOD data, which might be impractical and
challenging. In this work, we propose a novel framework, namely, Diversified
Outlier Exposure (DivOE), for effective OOD detection via informative
extrapolation based on the given auxiliary outliers. Specifically, DivOE
introduces a new learning objective, which diversifies the auxiliary
distribution by explicitly synthesizing more informative outliers for
extrapolation during training. It leverages a multi-step optimization method to
generate novel outliers beyond the original ones, which is compatible with many
variants of outlier exposure. Extensive experiments and analyses have been
conducted to characterize and demonstrate the effectiveness of the proposed
DivOE. The code is publicly available at: https://github.com/tmlr-group/DivOE.Comment: accepted by NeurIPS 202
Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels
In recent years, research on learning with noisy labels has focused on
devising novel algorithms that can achieve robustness to noisy training labels
while generalizing to clean data. These algorithms often incorporate
sophisticated techniques, such as noise modeling, label correction, and
co-training. In this study, we demonstrate that a simple baseline using
cross-entropy loss, combined with widely used regularization strategies like
learning rate decay, model weights average, and data augmentations, can
outperform state-of-the-art methods. Our findings suggest that employing a
combination of regularization strategies can be more effective than intricate
algorithms in tackling the challenges of learning with noisy labels. While some
of these regularization strategies have been utilized in previous noisy label
learning research, their full potential has not been thoroughly explored. Our
results encourage a reevaluation of benchmarks for learning with noisy labels
and prompt reconsideration of the role of specialized learning algorithms
designed for training with noisy labels