Contrastive learning (CL) is one of the most successful paradigms for
self-supervised learning (SSL). In a principled way, it considers two augmented
"views" of the same image as positive to be pulled closer, and all other images
as negative to be pushed further apart. However, behind the impressive success
of CL-based techniques, their formulation often relies on heavy-computation
settings, including large sample batches, extensive training epochs, etc. We
are thus motivated to tackle these issues and establish a simple, efficient,
yet competitive baseline of contrastive learning. Specifically, we identify,
from theoretical and empirical studies, a noticeable negative-positive-coupling
(NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning
efficiency concerning the batch size. By removing the NPC effect, we propose
decoupled contrastive learning (DCL) loss, which removes the positive term from
the denominator and significantly improves the learning efficiency. DCL
achieves competitive performance with less sensitivity to sub-optimal
hyperparameters, requiring neither large batches in SimCLR, momentum encoding
in MoCo, or large epochs. We demonstrate with various benchmarks while
manifesting robustness as much less sensitive to suboptimal hyperparameters.
Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch
size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by
6.4%. Further, DCL can be combined with the SOTA contrastive learning method,
NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400
epochs, which represents a new SOTA in contrastive learning. We believe DCL
provides a valuable baseline for future contrastive SSL studies.Comment: Accepted by ECCV202