1,127 research outputs found
ProMix: Combating Label Noise via Maximizing Clean Sample Utility
The ability to train deep neural networks under label noise is appealing, as
imperfectly annotated data are relatively cheaper to obtain. State-of-the-art
approaches are based on semi-supervised learning(SSL), which selects small loss
examples as clean and then applies SSL techniques for boosted performance.
However, the selection step mostly provides a medium-sized and decent-enough
clean subset, which overlooks a rich set of clean samples. In this work, we
propose a novel noisy label learning framework ProMix that attempts to maximize
the utility of clean samples for boosted performance. Key to our method, we
propose a matched high-confidence selection technique that selects those
examples having high confidence and matched prediction with its given labels.
Combining with the small-loss selection, our method is able to achieve a
precision of 99.27 and a recall of 98.22 in detecting clean samples on the
CIFAR-10N dataset. Based on such a large set of clean data, ProMix improves the
best baseline method by +2.67% on CIFAR-10N and +1.61% on CIFAR-100N datasets.
The code and data are available at https://github.com/Justherozen/ProMixComment: Winner of the 1st Learning and Mining with Noisy Labels Challenge in
IJCAI-ECAI 2022 (an informal technical report
Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective
We investigate the problem of learning with noisy labels in real-world
annotation scenarios, where noise can be categorized into two types: factual
noise and ambiguity noise. To better distinguish these noise types and utilize
their semantics, we propose a novel sample selection-based approach for noisy
label learning, called Proto-semi. Proto-semi initially divides all samples
into the confident and unconfident datasets via warm-up. By leveraging the
confident dataset, prototype vectors are constructed to capture class
characteristics. Subsequently, the distances between the unconfident samples
and the prototype vectors are calculated to facilitate noise classification.
Based on these distances, the labels are either corrected or retained,
resulting in the refinement of the confident and unconfident datasets. Finally,
we introduce a semi-supervised learning method to enhance training. Empirical
evaluations on a real-world annotated dataset substantiate the robustness of
Proto-semi in handling the problem of learning from noisy labels. Meanwhile,
the prototype-based repartitioning strategy is shown to be effective in
mitigating the adverse impact of label noise. Our code and data are available
at https://github.com/fuxiAIlab/ProtoSemi
Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey
Image classification systems recently made a giant leap with the advancement
of deep neural networks. However, these systems require an excessive amount of
labeled data to be adequately trained. Gathering a correctly annotated dataset
is not always feasible due to several factors, such as the expensiveness of the
labeling process or difficulty of correctly classifying data, even for the
experts. Because of these practical challenges, label noise is a common problem
in real-world datasets, and numerous methods to train deep neural networks with
label noise are proposed in the literature. Although deep neural networks are
known to be relatively robust to label noise, their tendency to overfit data
makes them vulnerable to memorizing even random noise. Therefore, it is crucial
to consider the existence of label noise and develop counter algorithms to fade
away its adverse effects to train deep neural networks efficiently. Even though
an extensive survey of machine learning techniques under label noise exists,
the literature lacks a comprehensive survey of methodologies centered
explicitly around deep learning in the presence of noisy labels. This paper
aims to present these algorithms while categorizing them into one of the two
subgroups: noise model based and noise model free methods. Algorithms in the
first group aim to estimate the noise structure and use this information to
avoid the adverse effects of noisy labels. Differently, methods in the second
group try to come up with inherently noise robust algorithms by using
approaches like robust losses, regularizers or other learning paradigms
Meta Soft Label Generation for Noisy Labels
The existence of noisy labels in the dataset causes significant performance
degradation for deep neural networks (DNNs). To address this problem, we
propose a Meta Soft Label Generation algorithm called MSLG, which can jointly
generate soft labels using meta-learning techniques and learn DNN parameters in
an end-to-end fashion. Our approach adapts the meta-learning paradigm to
estimate optimal label distribution by checking gradient directions on both
noisy training data and noise-free meta-data. In order to iteratively update
soft labels, meta-gradient descent step is performed on estimated labels, which
would minimize the loss of noise-free meta samples. In each iteration, the base
classifier is trained on estimated meta labels. MSLG is model-agnostic and can
be added on top of any existing model at hand with ease. We performed extensive
experiments on CIFAR10, Clothing1M and Food101N datasets. Results show that our
approach outperforms other state-of-the-art methods by a large margin.Comment: Accepted by ICPR 202
Learning with Noisy labels via Self-supervised Adversarial Noisy Masking
Collecting large-scale datasets is crucial for training deep models,
annotating the data, however, inevitably yields noisy labels, which poses
challenges to deep learning algorithms. Previous efforts tend to mitigate this
problem via identifying and removing noisy samples or correcting their labels
according to the statistical properties (e.g., loss values) among training
samples. In this paper, we aim to tackle this problem from a new perspective,
delving into the deep feature maps, we empirically find that models trained
with clean and mislabeled samples manifest distinguishable activation feature
distributions. From this observation, a novel robust training approach termed
adversarial noisy masking is proposed. The idea is to regularize deep features
with a label quality guided masking scheme, which adaptively modulates the
input data and label simultaneously, preventing the model to overfit noisy
samples. Further, an auxiliary task is designed to reconstruct input data, it
naturally provides noise-free self-supervised signals to reinforce the
generalization ability of deep models. The proposed method is simple and
flexible, it is tested on both synthetic and real-world noisy datasets, where
significant improvements are achieved over previous state-of-the-art methods
- …