We tackle the problem of discovering novel classes in an image collection
given labelled examples of other classes. This setting is similar to
semi-supervised learning, but significantly harder because there are no
labelled examples for the new classes. The challenge, then, is to leverage the
information contained in the labelled images in order to learn a
general-purpose clustering model and use the latter to identify the new classes
in the unlabelled data. In this work we address this problem by combining three
ideas: (1) we suggest that the common approach of bootstrapping an image
representation using the labeled data only introduces an unwanted bias, and
that this can be avoided by using self-supervised learning to train the
representation from scratch on the union of labelled and unlabelled data; (2)
we use rank statistics to transfer the model's knowledge of the labelled
classes to the problem of clustering the unlabelled images; and, (3) we train
the data representation by optimizing a joint objective function on the
labelled and unlabelled subsets of the data, improving both the supervised
classification of the labelled data, and the clustering of the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform
current methods for novel category discovery by a significant margin.Comment: ICLR 2020, code: http://www.robots.ox.ac.uk/~vgg/research/auto_nove