101,115 research outputs found
Systematic analysis of the impact of label noise correction on ML Fairness
Arbitrary, inconsistent, or faulty decision-making raises serious concerns,
and preventing unfair models is an increasingly important challenge in Machine
Learning. Data often reflect past discriminatory behavior, and models trained
on such data may reflect bias on sensitive attributes, such as gender, race, or
age. One approach to developing fair models is to preprocess the training data
to remove the underlying biases while preserving the relevant information, for
example, by correcting biased labels. While multiple label noise correction
methods are available, the information about their behavior in identifying
discrimination is very limited. In this work, we develop an empirical
methodology to systematically evaluate the effectiveness of label noise
correction techniques in ensuring the fairness of models trained on biased
datasets. Our methodology involves manipulating the amount of label noise and
can be used with fairness benchmarks but also with standard ML datasets. We
apply the methodology to analyze six label noise correction methods according
to several fairness metrics on standard OpenML datasets. Our results suggest
that the Hybrid Label Noise Correction method achieves the best trade-off
between predictive performance and fairness. Clustering-Based Correction can
reduce discrimination the most, however, at the cost of lower predictive
performance
Multi-Label Noise Robust Collaborative Learning Model for Remote Sensing Image Classification
The development of accurate methods for multi-label classification (MLC) of
remote sensing (RS) images is one of the most important research topics in RS.
Methods based on Deep Convolutional Neural Networks (CNNs) have shown strong
performance gains in RS MLC problems. However, CNN-based methods usually
require a high number of reliable training images annotated by multiple
land-cover class labels. Collecting such data is time-consuming and costly. To
address this problem, the publicly available thematic products, which can
include noisy labels, can be used to annotate RS images with zero-labeling
cost. However, multi-label noise (which can be associated with wrong and
missing label annotations) can distort the learning process of the MLC
algorithm. The detection and correction of label noise are challenging tasks,
especially in a multi-label scenario, where each image can be associated with
more than one label. To address this problem, we propose a novel noise robust
collaborative multi-label learning (RCML) method to alleviate the adverse
effects of multi-label noise during the training phase of the CNN model. RCML
identifies, ranks and excludes noisy multi-labels in RS images based on three
main modules: 1) discrepancy module; 2) group lasso module; and 3) swap module.
The discrepancy module ensures that the two networks learn diverse features,
while producing the same predictions. The task of the group lasso module is to
detect the potentially noisy labels assigned to the multi-labeled training
images, while the swap module task is devoted to exchanging the ranking
information between two networks. Unlike existing methods that make assumptions
about the noise distribution, our proposed RCML does not make any prior
assumption about the type of noise in the training set. Our code is publicly
available online: http://www.noisy-labels-in-rs.orgComment: Our code is publicly available online:
http://www.noisy-labels-in-rs.or
CrossSplit: Mitigating Label Noise Memorization through Data Splitting
We approach the problem of improving robustness of deep learning algorithms
in the presence of label noise. Building upon existing label correction and
co-teaching methods, we propose a novel training procedure to mitigate the
memorization of noisy labels, called CrossSplit, which uses a pair of neural
networks trained on two disjoint parts of the labelled dataset. CrossSplit
combines two main ingredients: (i) Cross-split label correction. The idea is
that, since the model trained on one part of the data cannot memorize
example-label pairs from the other part, the training labels presented to each
network can be smoothly adjusted by using the predictions of its peer network;
(ii) Cross-split semi-supervised training. A network trained on one part of the
data also uses the unlabeled inputs of the other part. Extensive experiments on
CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that
our method can outperform the current state-of-the-art in a wide range of noise
ratios.Comment: Accepted to ICML 202
Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets
Learning against label noise is a vital topic to guarantee a reliable
performance for deep neural networks. Recent research usually refers to dynamic
noise modeling with model output probabilities and loss values, and then
separates clean and noisy samples. These methods have gained notable success.
However, unlike cherry-picked data, existing approaches often cannot perform
well when facing imbalanced datasets, a common scenario in the real world. We
thoroughly investigate this phenomenon and point out two major issues that
hinder the performance, i.e., \emph{inter-class loss distribution discrepancy}
and \emph{misleading predictions due to uncertainty}. The first issue is that
existing methods often perform class-agnostic noise modeling. However, loss
distributions show a significant discrepancy among classes under class
imbalance, and class-agnostic noise modeling can easily get confused with noisy
samples and samples in minority classes. The second issue refers to that models
may output misleading predictions due to epistemic uncertainty and aleatoric
uncertainty, thus existing methods that rely solely on the output probabilities
may fail to distinguish confident samples. Inspired by our observations, we
propose an Uncertainty-aware Label Correction framework~(ULC) to handle label
noise on imbalanced datasets. First, we perform epistemic uncertainty-aware
class-specific noise modeling to identify trustworthy clean samples and
refine/discard highly confident true/corrupted labels. Then, we introduce
aleatoric uncertainty in the subsequent learning process to prevent noise
accumulation in the label noise modeling process. We conduct experiments on
several synthetic and real-world datasets. The results demonstrate the
effectiveness of the proposed method, especially on imbalanced datasets
Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection
Recent studies on learning with noisy labels have shown remarkable
performance by exploiting a small clean dataset. In particular, model agnostic
meta-learning-based label correction methods further improve performance by
correcting noisy labels on the fly. However, there is no safeguard on the label
miscorrection, resulting in unavoidable performance degradation. Moreover,
every training step requires at least three back-propagations, significantly
slowing down the training speed. To mitigate these issues, we propose a robust
and efficient method that learns a label transition matrix on the fly.
Employing the transition matrix makes the classifier skeptical about all the
corrected samples, which alleviates the miscorrection issue. We also introduce
a two-head architecture to efficiently estimate the label transition matrix
every iteration within a single back-propagation, so that the estimated matrix
closely follows the shifting noise distribution induced by label correction.
Extensive experiments demonstrate that our approach shows the best performance
in training efficiency while having comparable or better accuracy than existing
methods.Comment: ECCV202
BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Face recognition has made tremendous progress in recent years due to the
advances in loss functions and the explosive growth in training sets size. A
properly designed loss is seen as key to extract discriminative features for
classification. Several margin-based losses have been proposed as alternatives
of softmax loss in face recognition. However, two issues remain to consider: 1)
They overlook the importance of hard sample mining for discriminative learning.
2) Label noise ubiquitously exists in large-scale datasets, which can seriously
damage the model's performance. In this paper, starting from the perspective of
decision boundary, we propose a novel mining framework that focuses on the
relationship between a sample's ground truth class center and its nearest
negative class center. Specifically, a closed-set noise label self-correction
module is put forward, making this framework work well on datasets containing a
lot of label noise. The proposed method consistently outperforms SOTA methods
in various face recognition benchmarks. Training code has been released at
https://github.com/SWJTU-3DVision/BoundaryFace.Comment: ECCV 2022. Code available at
https://github.com/SWJTU-3DVision/BoundaryFac
- …