26 research outputs found
A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts
Most existing zero-shot learning methods consider the problem as a visual
semantic embedding one. Given the demonstrated capability of Generative
Adversarial Networks(GANs) to generate images, we instead leverage GANs to
imagine unseen categories from text descriptions and hence recognize novel
classes with no examples being seen. Specifically, we propose a simple yet
effective generative model that takes as input noisy text descriptions about an
unseen class (e.g.Wikipedia articles) and generates synthesized visual features
for this class. With added pseudo data, zero-shot learning is naturally
converted to a traditional classification problem. Additionally, to preserve
the inter-class discrimination of the generated features, a visual pivot
regularization is proposed as an explicit supervision. Unlike previous methods
using complex engineered regularizers, our approach can suppress the noise well
without additional regularization. Empirically, we show that our method
consistently outperforms the state of the art on the largest available
benchmarks on Text-based Zero-shot Learning.Comment: To appear in CVPR1
Zero-Shot Learning from Adversarial Feature Residual to Compact Visual Feature
Recently, many zero-shot learning (ZSL) methods focused on learning
discriminative object features in an embedding feature space, however, the
distributions of the unseen-class features learned by these methods are prone
to be partly overlapped, resulting in inaccurate object recognition. Addressing
this problem, we propose a novel adversarial network to synthesize compact
semantic visual features for ZSL, consisting of a residual generator, a
prototype predictor, and a discriminator. The residual generator is to generate
the visual feature residual, which is integrated with a visual prototype
predicted via the prototype predictor for synthesizing the visual feature. The
discriminator is to distinguish the synthetic visual features from the real
ones extracted from an existing categorization CNN. Since the generated
residuals are generally numerically much smaller than the distances among all
the prototypes, the distributions of the unseen-class features synthesized by
the proposed network are less overlapped. In addition, considering that the
visual features from categorization CNNs are generally inconsistent with their
semantic features, a simple feature selection strategy is introduced for
extracting more compact semantic visual features. Extensive experimental
results on six benchmark datasets demonstrate that our method could achieve a
significantly better performance than existing state-of-the-art methods by
1.2-13.2% in most cases