9,547 research outputs found
Self-labelling via simultaneous clustering and representation learning
Combining clustering and representation learning is one of the most promising
approaches for unsupervised learning of deep neural networks. However, doing so
naively leads to ill posed learning problems with degenerate solutions. In this
paper, we propose a novel and principled learning formulation that addresses
these issues. The method is obtained by maximizing the information between
labels and input data indices. We show that this criterion extends standard
crossentropy minimization to an optimal transport problem, which we solve
efficiently for millions of input images and thousands of labels using a fast
variant of the Sinkhorn-Knopp algorithm. The resulting method is able to
self-label visual data so as to train highly competitive image representations
without manual labels. Our method achieves state of the art representation
learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and
ImageNet and yields the first self-supervised AlexNet that outperforms the
supervised Pascal VOC detection baseline. Code and models are available.Comment: Accepted paper at the International Conference on Learning
Representations (ICLR) 202
Labelling unlabelled videos from scratch with multi-modal self-supervision
A large part of the current success of deep learning lies in the
effectiveness of data -- more precisely: labelled data. Yet, labelling a
dataset with human annotation continues to carry high costs, especially for
videos. While in the image domain, recent methods have allowed to generate
meaningful (pseudo-) labels for unlabelled datasets without supervision, this
development is missing for the video domain where learning feature
representations is the current focus. In this work, we a) show that
unsupervised labelling of a video dataset does not come for free from strong
feature encoders and b) propose a novel clustering method that allows
pseudo-labelling of a video dataset without any human annotations, by
leveraging the natural correspondence between the audio and visual modalities.
An extensive analysis shows that the resulting clusters have high semantic
overlap to ground truth human labels. We further introduce the first
benchmarking results on unsupervised labelling of common video datasets
Kinetics, Kinetics-Sound, VGG-Sound and AVE.Comment: Accepted to NeurIPS 2020. Project page:
https://www.robots.ox.ac.uk/~vgg/research/selavi, code:
https://github.com/facebookresearch/selav
Information Maximization Clustering via Multi-View Self-Labelling
Image clustering is a particularly challenging computer vision task, which
aims to generate annotations without human supervision. Recent advances focus
on the use of self-supervised learning strategies in image clustering, by first
learning valuable semantics and then clustering the image representations.
These multiple-phase algorithms, however, increase the computational time and
their final performance is reliant on the first stage. By extending the
self-supervised approach, we propose a novel single-phase clustering method
that simultaneously learns meaningful representations and assigns the
corresponding annotations. This is achieved by integrating a discrete
representation into the self-supervised paradigm through a classifier net.
Specifically, the proposed clustering objective employs mutual information, and
maximizes the dependency between the integrated discrete representation and a
discrete probability distribution. The discrete probability distribution is
derived though the self-supervised process by comparing the learnt latent
representation with a set of trainable prototypes. To enhance the learning
performance of the classifier, we jointly apply the mutual information across
multi-crop views. Our empirical results show that the proposed framework
outperforms state-of-the-art techniques with the average accuracy of 89.1% and
49.0%, respectively, on CIFAR-10 and CIFAR-100/20 datasets. Finally, the
proposed method also demonstrates attractive robustness to parameter settings,
making it ready to be applicable to other datasets
Self-Supervised Classification Network
We present Self-Classifier -- a novel self-supervised end-to-end
classification learning approach. Self-Classifier learns labels and
representations simultaneously in a single-stage end-to-end manner by
optimizing for same-class prediction of two augmented views of the same sample.
To guarantee non-degenerate solutions (i.e., solutions where all labels are
assigned to the same class) we propose a mathematically motivated variant of
the cross-entropy loss that has a uniform prior asserted on the predicted
labels. In our theoretical analysis we prove that degenerate solutions are not
in the set of optimal solutions of our approach. Self-Classifier is simple to
implement and scalable. Unlike other popular unsupervised classification and
contrastive representation learning approaches, it does not require any form of
pre-training, expectation maximization, pseudo-labelling, external clustering,
a second network, stop-gradient operation or negative pairs. Despite its
simplicity, our approach sets a new state of the art for unsupervised
classification of ImageNet; and even achieves comparable to state-of-the-art
results for unsupervised representation learning. Code:
https://github.com/elad-amrani/self-classifierComment: Update method and add experiment
- …