6,914 research outputs found
Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks
We propose a novel framework called Semantics-Preserving Adversarial
Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test
images and their classes are both unseen during training. SP-AEN aims to tackle
the inherent problem --- semantic loss --- in the prevailing family of
embedding-based ZSL, where some semantics would be discarded during training if
they are non-discriminative for training classes, but could become critical for
recognizing test classes. Specifically, SP-AEN prevents the semantic loss by
introducing an independent visual-to-semantic space embedder which disentangles
the semantic space into two subspaces for the two arguably conflicting
objectives: classification and reconstruction. Through adversarial learning of
the two subspaces, SP-AEN can transfer the semantics from the reconstructive
subspace to the discriminative one, accomplishing the improved zero-shot
recognition of unseen classes. Comparing with prior works, SP-AEN can not only
improve classification but also generate photo-realistic images, demonstrating
the effectiveness of semantic preservation. On four popular benchmarks: CUB,
AWA, SUN and aPY, SP-AEN considerably outperforms other state-of-the-art
methods by an absolute performance difference of 12.2\%, 9.3\%, 4.0\%, and
3.6\% in terms of harmonic mean value
An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
Zero-shot learning (ZSL) methods have been studied in the unrealistic setting
where test data are assumed to come from unseen classes only. In this paper, we
advocate studying the problem of generalized zero-shot learning (GZSL) where
the test data's class memberships are unconstrained. We show empirically that
naively using the classifiers constructed by ZSL approaches does not perform
well in the generalized setting. Motivated by this, we propose a simple but
effective calibration method that can be used to balance two conflicting
forces: recognizing data from seen classes versus those from unseen ones. We
develop a performance metric to characterize such a trade-off and examine the
utility of this metric in evaluating various ZSL approaches. Our analysis
further shows that there is a large gap between the performance of existing
approaches and an upper bound established via idealized semantic embeddings,
suggesting that improving class semantic embeddings is vital to GZSL.Comment: ECCV2016 camera-read
- …