2,346 research outputs found
Transductive Zero-Shot Learning with a Self-training dictionary approach
As an important and challenging problem in computer vision, zero-shot
learning (ZSL) aims at automatically recognizing the instances from unseen
object classes without training data. To address this problem, ZSL is usually
carried out in the following two aspects: 1) capturing the domain distribution
connections between seen classes data and unseen classes data; and 2) modeling
the semantic interactions between the image feature space and the label
embedding space. Motivated by these observations, we propose a bidirectional
mapping based semantic relationship modeling scheme that seeks for crossmodal
knowledge transfer by simultaneously projecting the image features and label
embeddings into a common latent space. Namely, we have a bidirectional
connection relationship that takes place from the image feature space to the
latent space as well as from the label embedding space to the latent space. To
deal with the domain shift problem, we further present a transductive learning
approach that formulates the class prediction problem in an iterative refining
process, where the object classification capacity is progressively reinforced
through bootstrapping-based model updating over highly reliable instances.
Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate
the effectiveness of the proposed approach against the state-of-the-art
approaches
Class label autoencoder for zero-shot learning
Existing zero-shot learning (ZSL) methods usually learn a projection function
between a feature space and a semantic embedding space(text or attribute space)
in the training seen classes or testing unseen classes. However, the projection
function cannot be used between the feature space and multi-semantic embedding
spaces, which have the diversity characteristic for describing the different
semantic information of the same class. To deal with this issue, we present a
novel method to ZSL based on learning class label autoencoder (CLA). CLA can
not only build a uniform framework for adapting to multi-semantic embedding
spaces, but also construct the encoder-decoder mechanism for constraining the
bidirectional projection between the feature space and the class label space.
Moreover, CLA can jointly consider the relationship of feature classes and the
relevance of the semantic classes for improving zero-shot classification. The
CLA solution can provide both unseen class labels and the relation of the
different classes representation(feature or semantic information) that can
encode the intrinsic structure of classes. Extensive experiments demonstrate
the CLA outperforms state-of-art methods on four benchmark datasets, which are
AwA, CUB, Dogs and ImNet-2
Domain-Invariant Projection Learning for Zero-Shot Recognition
Zero-shot learning (ZSL) aims to recognize unseen object classes without any
training samples, which can be regarded as a form of transfer learning from
seen classes to unseen ones. This is made possible by learning a projection
between a feature space and a semantic space (e.g. attribute space). Key to ZSL
is thus to learn a projection function that is robust against the often large
domain gap between the seen and unseen classes. In this paper, we propose a
novel ZSL model termed domain-invariant projection learning (DIPL). Our model
has two novel components: (1) A domain-invariant feature self-reconstruction
task is introduced to the seen/unseen class data, resulting in a simple linear
formulation that casts ZSL into a min-min optimization problem. Solving the
problem is non-trivial, and a novel iterative algorithm is formulated as the
solver, with rigorous theoretic algorithm analysis provided. (2) To further
align the two domains via the learned projection, shared semantic structure
among seen and unseen classes is explored via forming superclasses in the
semantic space. Extensive experiments show that our model outperforms the
state-of-the-art alternatives by significant margins.Comment: Accepted to NIPS 201
Zero-Shot Learning via Latent Space Encoding
Zero-Shot Learning (ZSL) is typically achieved by resorting to a class
semantic embedding space to transfer the knowledge from the seen classes to
unseen ones. Capturing the common semantic characteristics between the visual
modality and the class semantic modality (e.g., attributes or word vector) is a
key to the success of ZSL. In this paper, we propose a novel encoder-decoder
approach, namely Latent Space Encoding (LSE), to connect the semantic relations
of different modalities. Instead of requiring a projection function to transfer
information across different modalities like most previous work, LSE per- forms
the interactions of different modalities via a feature aware latent space,
which is learned in an implicit way. Specifically, different modalities are
modeled separately but optimized jointly. For each modality, an encoder-decoder
framework is performed to learn a feature aware latent space via jointly
maximizing the recoverability of the original space from the latent space and
the predictability of the latent space from the original space. To relate
different modalities together, their features referring to the same concept are
enforced to share the same latent codings. In this way, the common semantic
characteristics of different modalities are generalized with the latent
representations. Another property of the proposed approach is that it is easily
extended to more modalities. Extensive experimental results on four benchmark
datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of
the proposed approach on several ZSL tasks, including traditional ZSL,
generalized ZSL, and zero-shot retrieval (ZSR)
Zero and Few Shot Learning with Semantic Feature Synthesis and Competitive Learning
Zero-shot learning (ZSL) is made possible by learning a projection function
between a feature space and a semantic space (e.g.,~an attribute space). Key to
ZSL is thus to learn a projection that is robust against the often large domain
gap between the seen and unseen class domains. In this work, this is achieved
by unseen class data synthesis and robust projection function learning.
Specifically, a novel semantic data synthesis strategy is proposed, by which
semantic class prototypes (e.g., attribute vectors) are used to simply perturb
seen class data for generating unseen class ones. As in any data
synthesis/hallucination approach, there are ambiguities and uncertainties on
how well the synthesised data can capture the targeted unseen class data
distribution. To cope with this, the second contribution of this work is a
novel projection learning model termed competitive bidirectional projection
learning (BPL) designed to best utilise the ambiguous synthesised data.
Specifically, we assume that each synthesised data point can belong to any
unseen class; and the most likely two class candidates are exploited to learn a
robust projection function in a competitive fashion. As a third contribution,
we show that the proposed ZSL model can be easily extended to few-shot learning
(FSL) by again exploiting semantic (class prototype guided) feature synthesis
and competitive BPL. Extensive experiments show that our model achieves the
state-of-the-art results on both problems.Comment: Submitted to IEEE TPAM
Bi-Adversarial Auto-Encoder for Zero-Shot Learning
Existing generative Zero-Shot Learning (ZSL) methods only consider the
unidirectional alignment from the class semantics to the visual features while
ignoring the alignment from the visual features to the class semantics, which
fails to construct the visual-semantic interactions well. In this paper, we
propose to synthesize visual features based on an auto-encoder framework paired
with bi-adversarial networks respectively for visual and semantic modalities to
reinforce the visual-semantic interactions with a bi-directional alignment,
which ensures the synthesized visual features to fit the real visual
distribution and to be highly related to the semantics. The encoder aims at
synthesizing real-like visual features while the decoder forces both the real
and the synthesized visual features to be more related to the class semantics.
To further capture the discriminative information of the synthesized visual
features, both the real and synthesized visual features are forced to be
classified into the correct classes via a classification network. Experimental
results on four benchmark datasets show that the proposed approach is
particularly competitive on both the traditional ZSL and the generalized ZSL
tasks
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Zero-Shot Learning (ZSL) in video classification is a promising research
direction, which aims to tackle the challenge from explosive growth of video
categories. Most existing methods exploit seen-to-unseen correlation via
learning a projection between visual and semantic spaces. However, such
projection-based paradigms cannot fully utilize the discriminative information
implied in data distribution, and commonly suffer from the information
degradation issue caused by "heterogeneity gap". In this paper, we propose a
visual data synthesis framework via GAN to address these problems.
Specifically, both semantic knowledge and visual distribution are leveraged to
synthesize video feature of unseen categories, and ZSL can be turned into
typical supervised problem with the synthetic features. First, we propose
multi-level semantic inference to boost video feature synthesis, which captures
the discriminative information implied in joint visual-semantic distribution
via feature-level and label-level semantic inference. Second, we propose
Matching-aware Mutual Information Correlation to overcome information
degradation issue, which captures seen-to-unseen correlation in matched and
mismatched visual-semantic pairs by mutual information, providing the zero-shot
synthesis procedure with robust guidance signals. Experimental results on four
video datasets demonstrate that our approach can improve the zero-shot video
classification performance significantly.Comment: 7 pages, accepted by International Joint Conference on Artificial
Intelligence (IJCAI) 201
Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning
Zero-shot learning (ZSL) aims to recognize novel classes by transferring
semantic knowledge from seen classes to unseen classes. Though many ZSL methods
rely on a direct mapping between the visual and the semantic space, the
calibration deviation and hubness problem limit the generalization capability
to unseen classes. Recently emerged generative ZSL methods generate unseen
image features to transform ZSL into a supervised classification problem.
However, most generative models still suffer from the seen-unseen bias problem
as only seen data is used for training. To address these issues, we propose a
novel bidirectional embedding based generative model with a tight
visual-semantic coupling constraint. We learn a unified latent space that
calibrates the embedded parametric distributions of both visual and semantic
spaces. Since the embedding from high-dimensional visual features comprise much
non-semantic information, the alignment of visual and semantic in latent space
would inevitably been deviated. Therefore, we introduce information bottleneck
(IB) constraint to ZSL for the first time to preserve essential attribute
information during the mapping. Specifically, we utilize the uncertainty
estimation and the wake-sleep procedure to alleviate the feature noises and
improve model abstraction capability. In addition, our method can be easily
extended to transductive ZSL setting by generating labels for unseen images. We
then introduce a robust loss to solve this label noise problem. Extensive
experimental results show that our method outperforms the state-of-the-art
methods in different ZSL settings on most benchmark datasets. The code will be
available at https://github.com/osierboy/IBZSL.Comment: The new version is not complet
Bidirectional Mapping Coupled GAN for Generalized Zero-Shot Learning
Bidirectional mapping-based generalized zero-shot learning (GZSL) methods
rely on the quality of synthesized features to recognize seen and unseen data.
Therefore, learning a joint distribution of seen-unseen domains and preserving
domain distinction is crucial for these methods. However, existing methods only
learn the underlying distribution of seen data, although unseen class semantics
are available in the GZSL problem setting. Most methods neglect retaining
domain distinction and use the learned distribution to recognize seen and
unseen data. Consequently, they do not perform well. In this work, we utilize
the available unseen class semantics alongside seen class semantics and learn
joint distribution through a strong visual-semantic coupling. We propose a
bidirectional mapping coupled generative adversarial network (BMCoGAN) by
extending the coupled generative adversarial network into a dual-domain
learning bidirectional mapping model. We further integrate a Wasserstein
generative adversarial optimization to supervise the joint distribution
learning. We design a loss optimization for retaining domain distinctive
information in the synthesized features and reducing bias towards seen classes,
which pushes synthesized seen features towards real seen features and pulls
synthesized unseen features away from real seen features. We evaluate BMCoGAN
on benchmark datasets and demonstrate its superior performance against
contemporary methods
Learning Structured Semantic Embeddings for Visual Recognition
Numerous embedding models have been recently explored to incorporate semantic
knowledge into visual recognition. Existing methods typically focus on
minimizing the distance between the corresponding images and texts in the
embedding space but do not explicitly optimize the underlying structure. Our
key observation is that modeling the pairwise image-image relationship improves
the discrimination ability of the embedding model. In this paper, we propose
the structured discriminative and difference constraints to learn
visual-semantic embeddings. First, we exploit the discriminative constraints to
capture the intra- and inter-class relationships of image embeddings. The
discriminative constraints encourage separability for image instances of
different classes. Second, we align the difference vector between a pair of
image embeddings with that of the corresponding word embeddings. The difference
constraints help regularize image embeddings to preserve the semantic
relationships among word embeddings. Extensive evaluations demonstrate the
effectiveness of the proposed structured embeddings for single-label
classification, multi-label classification, and zero-shot recognition.Comment: 9 pages, 6 figures, 5 tables, conferenc
- …