2,454 research outputs found
An Investigation into Whitening Loss for Self-supervised Learning
A desirable objective in self-supervised learning (SSL) is to avoid feature
collapse. Whitening loss guarantees collapse avoidance by minimizing the
distance between embeddings of positive pairs under the conditioning that the
embeddings from different views are whitened. In this paper, we propose a
framework with an informative indicator to analyze whitening loss, which
provides a clue to demystify several interesting phenomena as well as a
pivoting point connecting to other SSL methods. We reveal that batch whitening
(BW) based methods do not impose whitening constraints on the embedding, but
they only require the embedding to be full-rank. This full-rank constraint is
also sufficient to avoid dimensional collapse. Based on our analysis, we
propose channel whitening with random group partition (CW-RGP), which exploits
the advantages of BW-based methods in preventing collapse and avoids their
disadvantages requiring large batch size. Experimental results on ImageNet
classification and COCO object detection reveal that the proposed CW-RGP
possesses a promising potential for learning good representations. The code is
available at https://github.com/winci-ai/CW-RGP.Comment: Accepted at NeurIPS 2022. The Code is available at:
https://github.com/winci-ai/CW-RG
Whitening-based Contrastive Learning of Sentence Embeddings
This paper presents a whitening-based contrastive learning method for
sentence embedding learning (WhitenedCSE), which combines contrastive learning
with a novel shuffled group whitening. Generally, contrastive learning pulls
distortions of a single sample (i.e., positive samples) close and push negative
samples far away, correspondingly facilitating the alignment and uniformity in
the feature space. A popular alternative to the "pushing'' operation is
whitening the feature space, which scatters all the samples for uniformity.
Since the whitening and the contrastive learning have large redundancy w.r.t.
the uniformity, they are usually used separately and do not easily work
together. For the first time, this paper integrates whitening into the
contrastive learning scheme and facilitates two benefits. 1) Better uniformity.
We find that these two approaches are not totally redundant but actually have
some complementarity due to different uniformity mechanism. 2) Better
alignment. We randomly divide the feature into multiple groups along the
channel axis and perform whitening independently within each group. By
shuffling the group division, we derive multiple distortions of a single sample
and thus increase the positive sample diversity. Consequently, using multiple
positive samples with enhanced diversity further improves contrastive learning
due to better alignment. Extensive experiments on seven semantic textual
similarity tasks show our method achieves consistent improvement over the
contrastive learning baseline and sets new states of the art, e.g., 78.78\%
(+2.53\% based on BERT\ba) Spearman correlation on STS tasks.Comment: ACL 2023 Main Conference(Oral
Compute Less to Get More: Using ORC to Improve Sparse Filtering
Sparse Filtering is a popular feature learning algorithm for image
classification pipelines. In this paper, we connect the performance of Sparse
Filtering with spectral properties of the corresponding feature matrices. This
connection provides new insights into Sparse Filtering; in particular, it
suggests early stopping of Sparse Filtering. We therefore introduce the Optimal
Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We
show that this stopping criterion is related with pre-processing procedures
such as Statistical Whitening and demonstrate that it can make image
classification with Sparse Filtering considerably faster and more accurate
- …