13 research outputs found
Revisiting adapters with adversarial training
While adversarial training is generally used as a defense mechanism, recent
works show that it can also act as a regularizer. By co-training a neural
network on clean and adversarial inputs, it is possible to improve
classification accuracy on the clean, non-adversarial inputs. We demonstrate
that, contrary to previous findings, it is not necessary to separate batch
statistics when co-training on clean and adversarial inputs, and that it is
sufficient to use adapters with few domain-specific parameters for each type of
input. We establish that using the classification token of a Vision Transformer
(ViT) as an adapter is enough to match the classification performance of dual
normalization layers, while using significantly less additional parameters.
First, we improve upon the top-1 accuracy of a non-adversarially trained
ViT-B16 model by +1.12% on ImageNet (reaching 83.76% top-1 accuracy). Second,
and more importantly, we show that training with adapters enables model soups
through linear combinations of the clean and adversarial tokens. These model
soups, which we call adversarial model soups, allow us to trade-off between
clean and robust accuracy without sacrificing efficiency. Finally, we show that
we can easily adapt the resulting models in the face of distribution shifts.
Our ViT-B16 obtains top-1 accuracies on ImageNet variants that are on average
+4.00% better than those obtained with Masked Autoencoders
Semi-Supervised Learning with Scarce Annotations
While semi-supervised learning (SSL) algorithms provide an efficient way to
make use of both labelled and unlabelled data, they generally struggle when the
number of annotated samples is very small. In this work, we consider the
problem of SSL multi-class classification with very few labelled instances. We
introduce two key ideas. The first is a simple but effective one: we leverage
the power of transfer learning among different tasks and self-supervision to
initialize a good representation of the data without making use of any label.
The second idea is a new algorithm for SSL that can exploit well such a
pre-trained representation.
The algorithm works by alternating two phases, one fitting the labelled
points and one fitting the unlabelled ones, with carefully-controlled
information flow between them. The benefits are greatly reducing overfitting of
the labelled data and avoiding issue with balancing labelled and unlabelled
losses during training. We show empirically that this method can successfully
train competitive models with as few as 10 labelled data points per class. More
in general, we show that the idea of bootstrapping features using
self-supervised learning always improves SSL on standard benchmarks. We show
that our algorithm works increasingly well compared to other methods when
refining from other tasks or datasets.Comment: Workshop on Deep Vision, CVPR 202
iCaRL: Incremental Classifier and Representation Learning
A major open problem on the road to artificial intelligence is the
development of incrementally learning systems that learn about more and more
concepts over time from a stream of data. In this work, we introduce a new
training strategy, iCaRL, that allows learning in such a class-incremental way:
only the training data for a small number of classes has to be present at the
same time and new classes can be added progressively. iCaRL learns strong
classifiers and a data representation simultaneously. This distinguishes it
from earlier works that were fundamentally limited to fixed data
representations and therefore incompatible with deep learning architectures. We
show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can
learn many classes incrementally over a long period of time where other
strategies quickly fail.Comment: Accepted paper at CVPR 201
There and Back Again: Revisiting Backpropagation Saliency Methods
Saliency methods seek to explain the predictions of a model by producing an
importance map across each input sample. A popular class of such methods is
based on backpropagating a signal and analyzing the resulting gradient. Despite
much research on such methods, relatively little work has been done to clarify
the differences between such methods as well as the desiderata of these
techniques. Thus, there is a need for rigorously understanding the
relationships between different methods as well as their failure modes. In this
work, we conduct a thorough analysis of backpropagation-based saliency methods
and propose a single framework under which several such methods can be unified.
As a result of our study, we make three additional contributions. First, we use
our framework to propose NormGrad, a novel saliency method based on the spatial
contribution of gradients of convolutional weights. Second, we combine saliency
maps at different layers to test the ability of saliency methods to extract
complementary information at different network levels (e.g.~trading off spatial
resolution and distinctiveness) and we explain why some methods fail at
specific layers (e.g., Grad-CAM anywhere besides the last convolutional layer).
Third, we introduce a class-sensitivity metric and a meta-learning inspired
paradigm applicable to any saliency method for improving sensitivity to the
output class being explained.Comment: CVPR 202
Learning multiple visual domains with residual adapters
There is a growing interest in learning data representations that work well
for many different types of problems and data. In this paper, we look in
particular at the task of learning a single visual representation that can be
successfully utilized in the analysis of very different types of images, from
dog breeds to stop signs and digits. Inspired by recent work on learning
networks that predict the parameters of another, we develop a tunable deep
network architecture that, by means of adapter residual modules, can be steered
on the fly to diverse visual domains. Our method achieves a high degree of
parameter sharing while maintaining or even improving the accuracy of
domain-specific representations. We also introduce the Visual Decathlon
Challenge, a benchmark that evaluates the ability of representations to capture
simultaneously ten very different visual domains and measures their ability to
recognize well uniformly
Efficient parametrization of multi-domain deep neural networks
A practical limitation of deep neural networks is their high degree of
specialization to a single task and visual domain. Recently, inspired by the
successes of transfer learning, several authors have proposed to learn instead
universal, fixed feature extractors that, used as the first stage of any deep
network, work well for several tasks and domains simultaneously. Nevertheless,
such universal features are still somewhat inferior to specialized networks.
To overcome this limitation, in this paper we propose to consider instead
universal parametric families of neural networks, which still contain
specialized problem-specific models, but differing only by a small number of
parameters. We study different designs for such parametrizations, including
series and parallel residual adapters, joint adapter compression, and parameter
allocations, and empirically identify the ones that yield the highest
compression. We show that, in order to maximize performance, it is necessary to
adapt both shallow and deep layers of a deep network, but the required changes
are very small. We also show that these universal parametrization are very
effective for transfer learning, where they outperform traditional fine-tuning
techniques.Comment: CVPR 201
Automatically Discovering and Learning New Visual Categories with Ranking Statistics
We tackle the problem of discovering novel classes in an image collection
given labelled examples of other classes. This setting is similar to
semi-supervised learning, but significantly harder because there are no
labelled examples for the new classes. The challenge, then, is to leverage the
information contained in the labelled images in order to learn a
general-purpose clustering model and use the latter to identify the new classes
in the unlabelled data. In this work we address this problem by combining three
ideas: (1) we suggest that the common approach of bootstrapping an image
representation using the labeled data only introduces an unwanted bias, and
that this can be avoided by using self-supervised learning to train the
representation from scratch on the union of labelled and unlabelled data; (2)
we use rank statistics to transfer the model's knowledge of the labelled
classes to the problem of clustering the unlabelled images; and, (3) we train
the data representation by optimizing a joint objective function on the
labelled and unlabelled subsets of the data, improving both the supervised
classification of the labelled data, and the clustering of the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform
current methods for novel category discovery by a significant margin.Comment: ICLR 2020, code: http://www.robots.ox.ac.uk/~vgg/research/auto_nove