1,521 research outputs found
Dissecting Supervised Contrastive Learning
Minimizing cross-entropy over the softmax scores of a linear map composed
with a high-capacity encoder is arguably the most popular choice for training
neural networks on supervised learning tasks. However, recent works show that
one can directly optimize the encoder instead, to obtain equally (or even more)
discriminative representations via a supervised variant of a contrastive
objective. In this work, we address the question whether there are fundamental
differences in the sought-for representation geometry in the output space of
the encoder at minimal loss. Specifically, we prove, under mild assumptions,
that both losses attain their minimum once the representations of each class
collapse to the vertices of a regular simplex, inscribed in a hypersphere. We
provide empirical evidence that this configuration is attained in practice and
that reaching a close-to-optimal state typically indicates good generalization
performance. Yet, the two losses show remarkably different optimization
behavior. The number of iterations required to perfectly fit to data scales
superlinearly with the amount of randomly flipped labels for the supervised
contrastive loss. This is in contrast to the approximately linear scaling
previously reported for networks trained with cross-entropy.Comment: ICML 2021 camera ready versio
Unbiased Supervised Contrastive Learning
Many datasets are biased, namely they contain easy-to-learn features that are
highly correlated with the target class only in the dataset but not in the true
underlying distribution of the data. For this reason, learning unbiased models
from biased data has become a very relevant research topic in the last years.
In this work, we tackle the problem of learning representations that are robust
to biases. We first present a margin-based theoretical framework that allows us
to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when
dealing with biased data. Based on that, we derive a novel formulation of the
supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate
control of the minimal distance between positive and negative samples.
Furthermore, thanks to our theoretical framework, we also propose FairKL, a new
debiasing regularization loss, that works well even with extremely biased data.
We validate the proposed losses on standard vision datasets including CIFAR10,
CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with
epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased
datasets, including real instances of biases in the wild.Comment: Accepted at ICLR 202
Bayesian Self-Supervised Contrastive Learning
Recent years have witnessed many successful applications of contrastive
learning in diverse domains, yet its self-supervised version still remains many
exciting challenges. As the negative samples are drawn from unlabeled datasets,
a randomly selected sample may be actually a false negative to an anchor,
leading to incorrect encoder training. This paper proposes a new
self-supervised contrastive loss called the BCL loss that still uses random
samples from the unlabeled data while correcting the resulting bias with
importance weights. The key idea is to design the desired sampling distribution
for sampling hard true negative samples under the Bayesian framework. The
prominent advantage lies in that the desired sampling distribution is a
parametric structure, with a location parameter for debiasing false negative
and concentration parameter for mining hard negative, respectively. Experiments
validate the effectiveness and superiority of the BCL loss.Comment: 18 page
Dictionary-Assisted Supervised Contrastive Learning
Text analysis in the social sciences often involves using specialized
dictionaries to reason with abstract concepts, such as perceptions about the
economy or abuse on social media. These dictionaries allow researchers to
impart domain knowledge and note subtle usages of words relating to a
concept(s) of interest. We introduce the dictionary-assisted supervised
contrastive learning (DASCL) objective, allowing researchers to leverage
specialized dictionaries when fine-tuning pretrained language models. The text
is first keyword simplified: a common, fixed token replaces any word in the
corpus that appears in the dictionary(ies) relevant to the concept of interest.
During fine-tuning, a supervised contrastive objective draws closer the
embeddings of the original and keyword-simplified texts of the same class while
pushing further apart the embeddings of different classes. The
keyword-simplified texts of the same class are more textually similar than
their original text counterparts, which additionally draws the embeddings of
the same class closer together. Combining DASCL and cross-entropy improves
classification performance metrics in few-shot learning settings and social
science applications compared to using cross-entropy alone and alternative
contrastive and data augmentation methods.Comment: 6 pages, 5 figures, EMNLP 202
Universum-inspired Supervised Contrastive Learning
As an effective data augmentation method, Mixup synthesizes an extra amount
of samples through linear interpolations. Despite its theoretical dependency on
data properties, Mixup reportedly performs well as a regularizer and calibrator
contributing reliable robustness and generalization to deep model training. In
this paper, inspired by Universum Learning which uses out-of-class samples to
assist the target tasks, we investigate Mixup from a largely under-explored
perspective - the potential to generate in-domain samples that belong to none
of the target classes, that is, universum. We find that in the framework of
supervised contrastive learning, Mixup-induced universum can serve as
surprisingly high-quality hard negatives, greatly relieving the need for large
batch sizes in contrastive learning. With these findings, we propose
Universum-inspired supervised Contrastive learning (UniCon), which incorporates
Mixup strategy to generate Mixup-induced universum as universum negatives and
pushes them apart from anchor samples of the target classes. We extend our
method to the unsupervised setting, proposing Unsupervised Universum-inspired
contrastive model (Un-Uni). Our approach not only improves Mixup with hard
labels, but also innovates a novel measure to generate universum data. With a
linear classifier on the learned representations, UniCon shows state-of-the-art
performance on various datasets. Specially, UniCon achieves 81.7% top-1
accuracy on CIFAR-100, surpassing the state of art by a significant margin of
5.2% with a much smaller batch size, typically, 256 in UniCon vs. 1024 in
SupCon using ResNet-50. Un-Uni also outperforms SOTA methods on CIFAR-100. The
code of this paper is released on https://github.com/hannaiiyanggit/UniCon.Comment: Accepted by IEEE Transactions on Image Processin
- …