Neural network representations contain structure beyond what was present in
the training labels. For instance, representations of images that are visually
or semantically similar tend to lie closer to each other than to dissimilar
images, regardless of their labels. Clustering these representations can thus
provide insights into dataset properties as well as the network internals. In
this work, we study how the many design choices involved in neural network
training affect the clusters formed in the hidden representations. To do so, we
establish an evaluation setup based on the BREEDS hierarchy, for the task of
subclass clustering after training models with only superclass information. We
isolate the training dataset and architecture as important factors affecting
clusterability. Datasets with labeled classes consisting of unrelated
subclasses yield much better clusterability than those following a natural
hierarchy. When using pretrained models to cluster representations on
downstream datasets, models pretrained on subclass labels provide better
clusterability than models pretrained on superclass labels, but only when there
is a high degree of domain overlap between the pretraining and downstream data.
Architecturally, we find that normalization strategies affect which layers
yield the best clustering performance, and, surprisingly, Vision Transformers
attain lower subclass clusterability than ResNets