Unsupervised visualization of image datasets using contrastive learning

Berens, Philipp; Böhm, Jan Niklas; Kobak, Dmitry

Unsupervised visualization of image datasets using contrastive learning

Authors: Philipp Berens
Jan Niklas Böhm
Dmitry Kobak
Publication date: 13 December 2022
Publisher

Abstract

Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful. For images represented in pixel space this is not the case, as distances in pixel space are often not capturing our sense of similarity and therefore neighbors are not semantically close. This problem can be circumvented by self-supervised approaches based on contrastive learning, such as SimCLR, relying on data augmentation to generate implicit neighbors, but these methods do not produce two-dimensional embeddings suitable for visualization. Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. T-SimCNE combines ideas from contrastive learning and neighbor embeddings, and trains a parametric mapping from the high-dimensional pixel space into two dimensions. We show that the resulting 2D embeddings achieve classification accuracy comparable to the state-of-the-art high-dimensional SimCLR representations, thus faithfully capturing semantic relationships. Using t-SimCNE, we obtain informative visualizations of the CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and highlighting artifacts and outliers

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2210.09879

Last time updated on 02/12/2022