744 research outputs found
Attribute Prototype Network for Zero-Shot Learning
From the beginning of zero-shot learning research, visual attributes have
been shown to play an important role. In order to better transfer
attribute-based knowledge from known to unknown classes, we argue that an image
representation with integrated attribute localization ability would be
beneficial for zero-shot learning. To this end, we propose a novel zero-shot
representation learning framework that jointly learns discriminative global and
local features using only class-level attributes. While a visual-semantic
embedding layer learns global features, local features are learned through an
attribute prototype network that simultaneously regresses and decorrelates
attributes from intermediate features. We show that our locality augmented
image representations achieve a new state-of-the-art on three zero-shot
learning benchmarks. As an additional benefit, our model points to the visual
evidence of the attributes in an image, e.g. for the CUB dataset, confirming
the improved attribute localization ability of our image representation.Comment: NeurIPS 2020. The code is publicly available at
https://wenjiaxu.github.io/APN-ZSL
Resolving Semantic Confusions for Improved Zero-Shot Detection
Zero-shot detection (ZSD) is a challenging task where we aim to recognize and
localize objects simultaneously, even when our model has not been trained with
visual samples of a few target ("unseen") classes. Recently, methods employing
generative models like GANs have shown some of the best results, where
unseen-class samples are generated based on their semantics by a GAN trained on
seen-class data, enabling vanilla object detectors to recognize unseen objects.
However, the problem of semantic confusion still remains, where the model is
sometimes unable to distinguish between semantically-similar classes. In this
work, we propose to train a generative model incorporating a triplet loss that
acknowledges the degree of dissimilarity between classes and reflects them in
the generated samples. Moreover, a cyclic-consistency loss is also enforced to
ensure that generated visual samples of a class highly correspond to their own
semantics. Extensive experiments on two benchmark ZSD datasets - MSCOCO and
PASCAL-VOC - demonstrate significant gains over the current ZSD methods,
reducing semantic confusion and improving detection for the unseen classes.Comment: Accepted to BMVC 2022 (Oral). 15 pages, 5 figures. Project page:
https://github.com/sandipan211/ZSD-SC-Resolve
- …