Self-supervised learning enables networks to learn discriminative features
from massive data itself. Most state-of-the-art methods maximize the similarity
between two augmentations of one image based on contrastive learning. By
utilizing the consistency of two augmentations, the burden of manual
annotations can be freed. Contrastive learning exploits instance-level
information to learn robust features. However, the learned information is
probably confined to different views of the same instance. In this paper, we
attempt to leverage the similarity between two distinct images to boost
representation in self-supervised learning. In contrast to instance-level
information, the similarity between two distinct images may provide more useful
information. Besides, we analyze the relation between similarity loss and
feature-level cross-entropy loss. These two losses are essential for most deep
learning methods. However, the relation between these two losses is not clear.
Similarity loss helps obtain instance-level representation, while feature-level
cross-entropy loss helps mine the similarity between two distinct images. We
provide theoretical analyses and experiments to show that a suitable
combination of these two losses can get state-of-the-art results. Code is
available at https://github.com/guijiejie/ICCL.Comment: This paper is accepted by IEEE Transactions on Image Processin