380 research outputs found
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
When labeled training data is scarce, a promising data augmentation approach
is to generate visual features of unknown classes using their attributes. To
learn the class conditional distribution of CNN features, these models rely on
pairs of image features and class attributes. Hence, they can not make use of
the abundance of unlabeled data samples. In this paper, we tackle any-shot
learning problems i.e. zero-shot and few-shot, in a unified feature generating
framework that operates in both inductive and transductive learning settings.
We develop a conditional generative model that combines the strength of VAE and
GANs and in addition, via an unconditional discriminator, learns the marginal
feature distribution of unlabeled images. We empirically show that our model
learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA
and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e.
inductive and transductive (generalized) zero- and few-shot learning settings.
We also demonstrate that our learned features are interpretable: we visualize
them by inverting them back to the pixel space and we explain them by
generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201
Learning Cross-domain Semantic-Visual Relation for Transductive Zero-Shot Learning
Zero-Shot Learning (ZSL) aims to learn recognition models for recognizing new
classes without labeled data. In this work, we propose a novel approach dubbed
Transferrable Semantic-Visual Relation (TSVR) to facilitate the cross-category
transfer in transductive ZSL. Our approach draws on an intriguing insight
connecting two challenging problems, i.e. domain adaptation and zero-shot
learning. Domain adaptation aims to transfer knowledge across two different
domains (i.e., source domain and target domain) that share the identical
task/label space. For ZSL, the source and target domains have different
tasks/label spaces. Hence, ZSL is usually considered as a more difficult
transfer setting compared with domain adaptation. Although the existing ZSL
approaches use semantic attributes of categories to bridge the source and
target domains, their performances are far from satisfactory due to the large
domain gap between different categories. In contrast, our method directly
transforms ZSL into a domain adaptation task through redrawing ZSL as
predicting the similarity/dissimilarity labels for the pairs of semantic
attributes and visual features. For this redrawn domain adaptation problem, we
propose to use a domain-specific batch normalization component to reduce the
domain discrepancy of semantic-visual pairs. Experimental results over diverse
ZSL benchmarks clearly demonstrate the superiority of our method
Semantic Embedding Space for Zero-Shot Action Recognition
The number of categories for action recognition is growing rapidly. It is
thus becoming increasingly hard to collect sufficient training data to learn
conventional models for each category. This issue may be ameliorated by the
increasingly popular 'zero-shot learning' (ZSL) paradigm. In this framework a
mapping is constructed between visual features and a human interpretable
semantic description of each category, allowing categories to be recognised in
the absence of any training data. Existing ZSL studies focus primarily on image
data, and attribute-based semantic representations. In this paper, we address
zero-shot recognition in contemporary video action recognition tasks, using
semantic word vector space as the common space to embed videos and category
labels. This is more challenging because the mapping between the semantic space
and space-time features of videos containing complex actions is more complex
and harder to learn. We demonstrate that a simple self-training and data
augmentation strategy can significantly improve the efficacy of this mapping.
Experiments on human action datasets including HMDB51 and UCF101 demonstrate
that our approach achieves the state-of-the-art zero-shot action recognition
performance.Comment: 5 page
- …