7,167 research outputs found
A critical analysis of self-supervision, or what we can learn from a single image
We look critically at popular self-supervision techniques for learning deep
convolutional neural networks without manual labels. We show that three
different and representative methods, BiGAN, RotNet and DeepCluster, can learn
the first few layers of a convolutional network from a single image as well as
using millions of images and manual labels, provided that strong data
augmentation is used. However, for deeper layers the gap with manual
supervision cannot be closed even if millions of unlabelled images are used for
training. We conclude that: (1) the weights of the early layers of deep
networks contain limited information about the statistics of natural images,
that (2) such low-level statistics can be learned through self-supervision just
as well as through strong supervision, and that (3) the low-level statistics
can be captured via synthetic transformations instead of using a large image
dataset.Comment: Accepted paper at the International Conference on Learning
Representations (ICLR) 202
CASSL: Curriculum Accelerated Self-Supervised Learning
Recent self-supervised learning approaches focus on using a few thousand data
points to learn policies for high-level, low-dimensional action spaces.
However, scaling this framework for high-dimensional control require either
scaling up the data collection efforts or using a clever sampling strategy for
training. We present a novel approach - Curriculum Accelerated Self-Supervised
Learning (CASSL) - to train policies that map visual information to high-level,
higher- dimensional action spaces. CASSL orders the sampling of training data
based on control dimensions: the learning and sampling are focused on few
control parameters before other parameters. The right curriculum for learning
is suggested by variance-based global sensitivity analysis of the control
space. We apply our CASSL framework to learning how to grasp using an adaptive,
underactuated multi-fingered gripper, a challenging system to control. Our
experimental results indicate that CASSL provides significant improvement and
generalization compared to baseline methods such as staged curriculum learning
(8% increase) and complete end-to-end learning with random exploration (14%
improvement) tested on a set of novel objects
Benchmarking Omni-Vision Representation through the Lens of Visual Realms
Though impressive performance has been achieved in specific visual realms
(e.g. faces, dogs, and places), an omni-vision representation generalizing to
many natural visual domains is highly desirable. But, existing benchmarks are
biased and inefficient to evaluate the omni-vision representation -- these
benchmarks either only include several specific realms, or cover most realms at
the expense of subsuming numerous datasets that have extensive realm
overlapping. In this paper, we propose Omni-Realm Benchmark (OmniBenchmark). It
includes 21 realm-wise datasets with 7,372 concepts and 1,074,346 images.
Without semantic overlapping, these datasets cover most visual realms
comprehensively and meanwhile efficiently. In addition, we propose a new
supervised contrastive learning framework, namely Relational Contrastive
learning (ReCo), for a better omni-vision representation. Beyond pulling two
instances from the same concept closer -- the typical supervised contrastive
learning framework -- ReCo also pulls two instances from the same semantic
realm closer, encoding the semantic relation between concepts, and facilitating
omni-vision representation learning. We benchmark ReCo and other advances in
omni-vision representation studies that are different in architectures (from
CNNs to transformers) and in learning paradigms (from supervised learning to
self-supervised learning) on OmniBenchmark. We illustrate the superior of ReCo
to other supervised contrastive learning methods and reveal multiple practical
observations to facilitate future research.Comment: In ECCV 2022; The project page at
https://zhangyuanhan-ai.github.io/OmniBenchmar
How Well Do Self-Supervised Models Transfer?
Self-supervised visual representation learning has seen huge progress
recently, but no large scale evaluation has compared the many models now
available. We evaluate the transfer performance of 13 top self-supervised
models on 40 downstream tasks, including many-shot and few-shot recognition,
object detection, and dense prediction. We compare their performance to a
supervised baseline and show that on most tasks the best self-supervised models
outperform supervision, confirming the recently observed trend in the
literature. We find ImageNet Top-1 accuracy to be highly correlated with
transfer to many-shot recognition, but increasingly less so for few-shot,
object detection and dense prediction. No single self-supervised method
dominates overall, suggesting that universal pre-training is still unsolved.
Our analysis of features suggests that top self-supervised learners fail to
preserve colour information as well as supervised alternatives, but tend to
induce better classifier calibration, and less attentive overfitting than
supervised learners.Comment: CVPR 2021. Code available at
https://github.com/linusericsson/ssl-transfe
- …