20 research outputs found
Learning from limited labeled data - Zero-Shot and Few-Shot Learning
Human beings have the remarkable ability to recognize novel visual concepts after observing only few or zero examples of them. Deep learning, however, often requires a large amount of labeled data to achieve a good performance. Labeled instances are expensive, difficult and even infeasible to obtain because the distribution of training instances among labels naturally exhibits a long tail. Therefore, it is of great interest to investigate how to learn efficiently from limited labeled data.
This thesis concerns an important subfield of learning from limited labeled data, namely, low-shot learning. The setting assumes the availability of many labeled examples from known classes and the goal is to learn novel classes from only a few~(few-shot learning) or zero~(zero-shot learning) training examples of them. To this end, we have developed a series of multi-modal learning approaches to facilitate the knowledge transfer from known classes to novel classes for a wide range of visual recognition tasks including image classification, semantic image segmentation and video action recognition. More specifically, this thesis mainly makes the following contributions. First, as there is no agreed upon zero-shot image classification benchmark, we define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets. Second, in order to tackle the labeled data scarcity, we propose feature generation frameworks that synthesize data in the visual feature space for novel classes. Third, we extend zero-shot learning and few-shot learning to the semantic segmentation task and propose a challenging benchmark for it. We show that incorporating semantic information into a semantic segmentation network is effective in segmenting novel classes. Finally, we develop better video representation for the few-shot video classification task and leverage weakly-labeled videos by an efficient retrieval method.Menschen haben die bemerkenswerte FĂ€higkeit, neuartige visuelle Konzepte zu erkennen, nachdem sie nur wenige oder gar keine Beispiele davon beobachtet haben. Tiefes Lernen erfordert jedoch oft eine groĂe Menge an beschrifteten Daten, um eine gute Leistung zu erzielen. Etikettierte Instanzen sind teuer, schwierig und sogar undurchfĂŒhrbar, weil die Verteilung der Trainingsinstanzen auf die Etiketten naturgemÀà einen langen Schwanz aufweist. Daher ist es von groĂem Interesse zu untersuchen, wie man effizient aus begrenzten gelabelten Daten lernen kann. Diese These betrifft einen wichtigen Teilbereich des Lernens aus begrenzt gelabelten Daten, nĂ€mlich das Low-Shot-Lernen. Das Setting setzt die VerfĂŒgbarkeit vieler gelabelter Beispiele aus bekannten Klassen voraus, und das Ziel ist es, neuartige Klassen aus nur wenigen (few-shot learning) oder null (zero-shot learning) Trainingsbeispielen davon zu lernen. Zu diesem Zweck haben wir eine Reihe von multimodalen LernansĂ€tzen entwickelt, um den Wissenstransfer von bekannten Klassen zu neuartigen Klassen fĂŒr ein breites Spektrum von visuellen Erkennungsaufgaben zu erleichtern, darunter Bildklassifizierung, semantische Bildsegmentierung und Videoaktionserkennung. Genauer gesagt, leistet diese Arbeit hauptsĂ€chlich die folgenden BeitrĂ€ge. Da es keinen vereinbarten Benchmark fĂŒr die Zero-Shot- Bildklassifikation gibt, definieren wir zunĂ€chst einen neuen Benchmark, indem wir sowohl die Evaluierungsprotokolle als auch die Datensplits öffentlich zugĂ€nglicher DatensĂ€tze vereinheitlichen. Zweitens schlagen wir zur BewĂ€ltigung der etikettierten Datenknappheit einen Rahmen fĂŒr die Generierung von Merkmalen vor, der Daten im visuellen Merkmalsraum fĂŒr neuartige Klassen synthetisiert. Drittens dehnen wir das Zero-Shot-Lernen und das few-Shot-Lernen auf die semantische Segmentierungsaufgabe aus und schlagen dafĂŒr einen anspruchsvollen Benchmark vor. Wir zeigen, dass die Einbindung semantischer Informationen in ein semantisches Segmentierungsnetz bei der Segmentierung neuartiger Klassen effektiv ist. SchlieĂlich entwickeln wir eine bessere Videodarstellung fĂŒr die Klassifizierungsaufgabe âfew-shot videoâ und nutzen schwach markierte Videos durch eine effiziente Abrufmethode.Max Planck Institute Informatic
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
When labeled training data is scarce, a promising data augmentation approach
is to generate visual features of unknown classes using their attributes. To
learn the class conditional distribution of CNN features, these models rely on
pairs of image features and class attributes. Hence, they can not make use of
the abundance of unlabeled data samples. In this paper, we tackle any-shot
learning problems i.e. zero-shot and few-shot, in a unified feature generating
framework that operates in both inductive and transductive learning settings.
We develop a conditional generative model that combines the strength of VAE and
GANs and in addition, via an unconditional discriminator, learns the marginal
feature distribution of unlabeled images. We empirically show that our model
learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA
and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e.
inductive and transductive (generalized) zero- and few-shot learning settings.
We also demonstrate that our learned features are interpretable: we visualize
them by inverting them back to the pixel space and we explain them by
generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201
Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly
Due to the importance of zero-shot learning, i.e. classifying images where
there is a lack of labeled training data, the number of proposed approaches has
recently increased steadily. We argue that it is time to take a step back and
to analyze the status quo of the area. The purpose of this paper is three-fold.
First, given the fact that there is no agreed upon zero-shot learning
benchmark, we first define a new benchmark by unifying both the evaluation
protocols and data splits of publicly available datasets used for this task.
This is an important contribution as published results are often not comparable
and sometimes even flawed due to, e.g. pre-training on zero-shot test classes.
Moreover, we propose a new zero-shot learning dataset, the Animals with
Attributes 2 (AWA2) dataset which we make publicly available both in terms of
image features and the images themselves. Second, we compare and analyze a
significant number of the state-of-the-art methods in depth, both in the
classic zero-shot setting but also in the more realistic generalized zero-shot
setting. Finally, we discuss in detail the limitations of the current status of
the area which can be taken as a basis for advancing it.Comment: Accepted by TPAMI in July, 2018. We introduce Proposed Split Version
2.0 (Please download it from our project webpage). arXiv admin note:
substantial text overlap with arXiv:1703.0439
Learning Prototype Classifiers for Long-Tailed Recognition
The problem of long-tailed recognition (LTR) has received attention in recent
years due to the fundamental power-law distribution of objects in the
real-world. Most recent works in LTR use softmax classifiers that have a
tendency to correlate classifier norm with the amount of training data for a
given class. On the other hand, Prototype classifiers do not suffer from this
shortcoming and can deliver promising results simply using Nearest-Class-Mean
(NCM), a special case where prototypes are empirical centroids. However, the
potential of Prototype classifiers as an alternative to softmax in LTR is
relatively underexplored. In this work, we propose Prototype classifiers, which
jointly learn prototypes that minimize average cross-entropy loss based on
probability scores from distances to prototypes. We theoretically analyze the
properties of Euclidean distance based prototype classifiers that leads to
stable gradient-based optimization which is robust to outliers. We further
enhance Prototype classifiers by learning channel-dependent temperature
parameters to enable independent distance scales along each channel. Our
analysis shows that prototypes learned by Prototype classifiers are better
separated than empirical centroids. Results on four long-tailed recognition
benchmarks show that Prototype classifier outperforms or is comparable to the
state-of-the-art methods.Comment: Accepted at IJCAI-2
Learning Graph Embeddings for Compositional Zero-shot Learning
In compositional zero-shot learning, the goal is to recognize unseen
compositions (e.g. old dog) of observed visual primitives states (e.g. old,
cute) and objects (e.g. car, dog) in the training set. This is challenging
because the same state can for example alter the visual appearance of a dog
drastically differently from a car. As a solution, we propose a novel graph
formulation called Compositional Graph Embedding (CGE) that learns image
features, compositional classifiers, and latent representations of visual
primitives in an end-to-end manner. The key to our approach is exploiting the
dependency between states, objects, and their compositions within a graph
structure to enforce the relevant knowledge transfer from seen to unseen
compositions. By learning a joint compatibility that encodes semantics between
concepts, our model allows for generalization to unseen compositions without
relying on an external knowledge base like WordNet. We show that in the
challenging generalized compositional zero-shot setting our CGE significantly
outperforms the state of the art on MIT-States and UT-Zappos. We also propose a
new benchmark for this task based on the recent GQA dataset.Comment: Accepted in IEEE CVPR 202
Prototype-based Incremental Few-Shot Segmentation
Semantic segmentation models have two fundamental weaknesses: i) they require large training sets with costly pixel-level annotations, and ii) they have a static output space, constrained to the classes of the training set. Toward addressing both problems, we introduce a new task, Incremental Few-Shot Segmentation (iFSS). The goal of iFSS is to extend a pretrained segmentation model with new classes from few annotated images and without access to old training data. To overcome the limitations of existing models iniFSS, we propose Prototype-based Incremental Few-Shot Segmentation (PIFS) that couples prototype learning and knowledge distillation. PIFS exploits prototypes to initialize the classifiers of new classes, fine-tuning the network to refine its features representation. We design a prototype-based distillation loss on the scores of both old and new class prototypes to avoid overfitting and forgetting, and batch-renormalization to cope with non-i.i.d.few-shot data. We create an extensive benchmark for iFSS showing that PIFS outperforms several few-shot and incremental learning methods in all scenarios
Attribute Prototype Network for Zero-Shot Learning
From the beginning of zero-shot learning research, visual attributes have
been shown to play an important role. In order to better transfer
attribute-based knowledge from known to unknown classes, we argue that an image
representation with integrated attribute localization ability would be
beneficial for zero-shot learning. To this end, we propose a novel zero-shot
representation learning framework that jointly learns discriminative global and
local features using only class-level attributes. While a visual-semantic
embedding layer learns global features, local features are learned through an
attribute prototype network that simultaneously regresses and decorrelates
attributes from intermediate features. We show that our locality augmented
image representations achieve a new state-of-the-art on three zero-shot
learning benchmarks. As an additional benefit, our model points to the visual
evidence of the attributes in an image, e.g. for the CUB dataset, confirming
the improved attribute localization ability of our image representation.Comment: NeurIPS 2020. The code is publicly available at
https://wenjiaxu.github.io/APN-ZSL