1,336 research outputs found
Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets
Learning against label noise is a vital topic to guarantee a reliable
performance for deep neural networks. Recent research usually refers to dynamic
noise modeling with model output probabilities and loss values, and then
separates clean and noisy samples. These methods have gained notable success.
However, unlike cherry-picked data, existing approaches often cannot perform
well when facing imbalanced datasets, a common scenario in the real world. We
thoroughly investigate this phenomenon and point out two major issues that
hinder the performance, i.e., \emph{inter-class loss distribution discrepancy}
and \emph{misleading predictions due to uncertainty}. The first issue is that
existing methods often perform class-agnostic noise modeling. However, loss
distributions show a significant discrepancy among classes under class
imbalance, and class-agnostic noise modeling can easily get confused with noisy
samples and samples in minority classes. The second issue refers to that models
may output misleading predictions due to epistemic uncertainty and aleatoric
uncertainty, thus existing methods that rely solely on the output probabilities
may fail to distinguish confident samples. Inspired by our observations, we
propose an Uncertainty-aware Label Correction framework~(ULC) to handle label
noise on imbalanced datasets. First, we perform epistemic uncertainty-aware
class-specific noise modeling to identify trustworthy clean samples and
refine/discard highly confident true/corrupted labels. Then, we introduce
aleatoric uncertainty in the subsequent learning process to prevent noise
accumulation in the label noise modeling process. We conduct experiments on
several synthetic and real-world datasets. The results demonstrate the
effectiveness of the proposed method, especially on imbalanced datasets
Segatron: Segment-Aware Transformer for Language Modeling and Understanding
Transformers are powerful for sequence modeling. Nearly all state-of-the-art
language models and pre-trained language models are based on the Transformer
architecture. However, it distinguishes sequential tokens only with the token
position index. We hypothesize that better contextual representations can be
generated from the Transformer with richer positional information. To verify
this, we propose a segment-aware Transformer (Segatron), by replacing the
original token position encoding with a combined position encoding of
paragraph, sentence, and token. We first introduce the segment-aware mechanism
to Transformer-XL, which is a popular Transformer-based language model with
memory extension and relative position encoding. We find that our method can
further improve the Transformer-XL base model and large model, achieving 17.1
perplexity on the WikiText-103 dataset. We further investigate the pre-training
masked language modeling task with Segatron. Experimental results show that
BERT pre-trained with Segatron (SegaBERT) can outperform BERT with vanilla
Transformer on various NLP tasks, and outperforms RoBERTa on zero-shot sentence
representation learning.Comment: Accepted by AAAI 202
- …