74 research outputs found
Efficient parametrization of multi-domain deep neural networks
A practical limitation of deep neural networks is their high degree of
specialization to a single task and visual domain. Recently, inspired by the
successes of transfer learning, several authors have proposed to learn instead
universal, fixed feature extractors that, used as the first stage of any deep
network, work well for several tasks and domains simultaneously. Nevertheless,
such universal features are still somewhat inferior to specialized networks.
To overcome this limitation, in this paper we propose to consider instead
universal parametric families of neural networks, which still contain
specialized problem-specific models, but differing only by a small number of
parameters. We study different designs for such parametrizations, including
series and parallel residual adapters, joint adapter compression, and parameter
allocations, and empirically identify the ones that yield the highest
compression. We show that, in order to maximize performance, it is necessary to
adapt both shallow and deep layers of a deep network, but the required changes
are very small. We also show that these universal parametrization are very
effective for transfer learning, where they outperform traditional fine-tuning
techniques.Comment: CVPR 201
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Following the major success of neural language models (LMs) such as BERT or
GPT-2 on a variety of language understanding tasks, recent work focused on
injecting (structured) knowledge from external resources into these models.
While on the one hand, joint pretraining (i.e., training from scratch, adding
objectives based on external knowledge to the primary LM objective) may be
prohibitively computationally expensive, post-hoc fine-tuning on external
knowledge, on the other hand, may lead to the catastrophic forgetting of
distributional knowledge. In this work, we investigate models for complementing
the distributional knowledge of BERT with conceptual knowledge from ConceptNet
and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using
adapter training. While overall results on the GLUE benchmark paint an
inconclusive picture, a deeper analysis reveals that our adapter-based models
substantially outperform BERT (up to 15-20 performance points) on inference
tasks that require the type of conceptual knowledge explicitly present in
ConceptNet and OMCS
Class-Agnostic Counting
Nearly all existing counting methods are designed for a specific object
class. Our work, however, aims to create a counting model able to count any
class of object. To achieve this goal, we formulate counting as a matching
problem, enabling us to exploit the image self-similarity property that
naturally exists in object counting problems. We make the following three
contributions: first, a Generic Matching Network (GMN) architecture that can
potentially count any object in a class-agnostic manner; second, by
reformulating the counting problem as one of matching objects, we can take
advantage of the abundance of video data labeled for tracking, which contains
natural repetitions suitable for training a counting model. Such data enables
us to train the GMN. Third, to customize the GMN to different user
requirements, an adapter module is used to specialize the model with minimal
effort, i.e. using a few labeled examples, and adapting only a small fraction
of the trained parameters. This is a form of few-shot learning, which is
practical for domains where labels are limited due to requiring expert
knowledge (e.g. microbiology). We demonstrate the flexibility of our method on
a diverse set of existing counting benchmarks: specifically cells, cars, and
human crowds. The model achieves competitive performance on cell and crowd
counting datasets, and surpasses the state-of-the-art on the car dataset using
only three training images. When training on the entire dataset, the proposed
method outperforms all previous methods by a large margin.Comment: Asian Conference on Computer Vision (ACCV), 201
Incremental multi-domain learning with network latent tensor factorization
The prominence of deep learning, large amount of annotated data and
increasingly powerful hardware made it possible to reach remarkable performance
for supervised classification tasks, in many cases saturating the training
sets. However the resulting models are specialized to a single very specific
task and domain. Adapting the learned classification to new domains is a hard
problem due to at least three reasons: (1) the new domains and the tasks might
be drastically different; (2) there might be very limited amount of annotated
data on the new domain and (3) full training of a new model for each new task
is prohibitive in terms of computation and memory, due to the sheer number of
parameters of deep CNNs. In this paper, we present a method to learn
new-domains and tasks incrementally, building on prior knowledge from already
learned tasks and without catastrophic forgetting. We do so by jointly
parametrizing weights across layers using low-rank Tucker structure. The core
is task agnostic while a set of task specific factors are learnt on each new
domain. We show that leveraging tensor structure enables better performance
than simply using matrix operations. Joint tensor modelling also naturally
leverages correlations across different layers. Compared with previous methods
which have focused on adapting each layer separately, our approach results in
more compact representations for each new task/domain. We apply the proposed
method to the 10 datasets of the Visual Decathlon Challenge and show that our
method offers on average about 7.5x reduction in number of parameters and
competitive performance in terms of both classification accuracy and Decathlon
score.Comment: AAAI2
Budget-Aware Adapters for Multi-Domain Learning
Multi-Domain Learning (MDL) refers to the problem of learning a set of models
derived from a common deep architecture, each one specialized to perform a task
in a certain domain (e.g., photos, sketches, paintings). This paper tackles MDL
with a particular interest in obtaining domain-specific models with an
adjustable budget in terms of the number of network parameters and
computational complexity. Our intuition is that, as in real applications the
number of domains and tasks can be very large, an effective MDL approach should
not only focus on accuracy but also on having as few parameters as possible. To
implement this idea we derive specialized deep models for each domain by
adapting a pre-trained architecture but, differently from other methods, we
propose a novel strategy to automatically adjust the computational complexity
of the network. To this aim, we introduce Budget-Aware Adapters that select the
most relevant feature channels to better handle data from a novel domain. Some
constraints on the number of active switches are imposed in order to obtain a
network respecting the desired complexity budget. Experimentally, we show that
our approach leads to recognition accuracy competitive with state-of-the-art
approaches but with much lighter networks both in terms of storage and
computation.Comment: ICCV 201
- …