Visualization methods based on the nearest neighbor graph, such as t-SNE or
UMAP, are widely used for visualizing high-dimensional data. Yet, these
approaches only produce meaningful results if the nearest neighbors themselves
are meaningful. For images represented in pixel space this is not the case, as
distances in pixel space are often not capturing our sense of similarity and
therefore neighbors are not semantically close. This problem can be
circumvented by self-supervised approaches based on contrastive learning, such
as SimCLR, relying on data augmentation to generate implicit neighbors, but
these methods do not produce two-dimensional embeddings suitable for
visualization. Here, we present a new method, called t-SimCNE, for unsupervised
visualization of image data. T-SimCNE combines ideas from contrastive learning
and neighbor embeddings, and trains a parametric mapping from the
high-dimensional pixel space into two dimensions. We show that the resulting 2D
embeddings achieve classification accuracy comparable to the state-of-the-art
high-dimensional SimCLR representations, thus faithfully capturing semantic
relationships. Using t-SimCNE, we obtain informative visualizations of the
CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and
highlighting artifacts and outliers