4,811 research outputs found
Multi-Label Zero-Shot Learning with Transfer-Aware Label Embedding Projection
Zero-shot learning transfers knowledge from seen classes to novel unseen
classes to reduce human labor of labelling data for building new classifiers.
Much effort on zero-shot learning however has focused on the standard
multi-class setting, the more challenging multi-label zero-shot problem has
received limited attention. In this paper we propose a transfer-aware embedding
projection approach to tackle multi-label zero-shot learning. The approach
projects the label embedding vectors into a low-dimensional space to induce
better inter-label relationships and explicitly facilitate information transfer
from seen labels to unseen labels, while simultaneously learning a max-margin
multi-label classifier with the projected label embeddings. Auxiliary
information can be conveniently incorporated to guide the label embedding
projection to further improve label relation structures for zero-shot knowledge
transfer. We conduct experiments for zero-shot multi-label image
classification. The results demonstrate the efficacy of the proposed approach
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Zero-Shot Learning (ZSL) in video classification is a promising research
direction, which aims to tackle the challenge from explosive growth of video
categories. Most existing methods exploit seen-to-unseen correlation via
learning a projection between visual and semantic spaces. However, such
projection-based paradigms cannot fully utilize the discriminative information
implied in data distribution, and commonly suffer from the information
degradation issue caused by "heterogeneity gap". In this paper, we propose a
visual data synthesis framework via GAN to address these problems.
Specifically, both semantic knowledge and visual distribution are leveraged to
synthesize video feature of unseen categories, and ZSL can be turned into
typical supervised problem with the synthetic features. First, we propose
multi-level semantic inference to boost video feature synthesis, which captures
the discriminative information implied in joint visual-semantic distribution
via feature-level and label-level semantic inference. Second, we propose
Matching-aware Mutual Information Correlation to overcome information
degradation issue, which captures seen-to-unseen correlation in matched and
mismatched visual-semantic pairs by mutual information, providing the zero-shot
synthesis procedure with robust guidance signals. Experimental results on four
video datasets demonstrate that our approach can improve the zero-shot video
classification performance significantly.Comment: 7 pages, accepted by International Joint Conference on Artificial
Intelligence (IJCAI) 201
Zero-Shot Learning via Latent Space Encoding
Zero-Shot Learning (ZSL) is typically achieved by resorting to a class
semantic embedding space to transfer the knowledge from the seen classes to
unseen ones. Capturing the common semantic characteristics between the visual
modality and the class semantic modality (e.g., attributes or word vector) is a
key to the success of ZSL. In this paper, we propose a novel encoder-decoder
approach, namely Latent Space Encoding (LSE), to connect the semantic relations
of different modalities. Instead of requiring a projection function to transfer
information across different modalities like most previous work, LSE per- forms
the interactions of different modalities via a feature aware latent space,
which is learned in an implicit way. Specifically, different modalities are
modeled separately but optimized jointly. For each modality, an encoder-decoder
framework is performed to learn a feature aware latent space via jointly
maximizing the recoverability of the original space from the latent space and
the predictability of the latent space from the original space. To relate
different modalities together, their features referring to the same concept are
enforced to share the same latent codings. In this way, the common semantic
characteristics of different modalities are generalized with the latent
representations. Another property of the proposed approach is that it is easily
extended to more modalities. Extensive experimental results on four benchmark
datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of
the proposed approach on several ZSL tasks, including traditional ZSL,
generalized ZSL, and zero-shot retrieval (ZSR)
Deep Residual Output Layers for Neural Language Generation
Many tasks, including language generation, benefit from learning the
structure of the output space, particularly when the space of output labels is
large and the data is sparse. State-of-the-art neural language models
indirectly capture the output space structure in their classifier weights since
they lack parameter sharing across output labels. Learning shared output label
mappings helps, but existing methods have limited expressivity and are prone to
overfitting. In this paper, we investigate the usefulness of more powerful
shared mappings for output labels, and propose a deep residual output mapping
with dropout between layers to better capture the structure of the output space
and avoid overfitting. Evaluations on three language generation tasks show that
our output label mapping can match or improve state-of-the-art recurrent and
self-attention architectures, and suggest that the classifier does not
necessarily need to be high-rank to better model natural language if it is
better at capturing the structure of the output space.Comment: To appear in ICML 201
Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation
Tying the weights of the target word embeddings with the target word
classifiers of neural machine translation models leads to faster training and
often to better translation quality. Given the success of this parameter
sharing, we investigate other forms of sharing in between no sharing and hard
equality of parameters. In particular, we propose a structure-aware output
layer which captures the semantic structure of the output space of words within
a joint input-output embedding. The model is a generalized form of weight tying
which shares parameters but allows learning a more flexible relationship with
input word embeddings and allows the effective capacity of the output layer to
be controlled. In addition, the model shares weights across output classifiers
and translation contexts which allows it to better leverage prior knowledge
about them. Our evaluation on English-to-Finnish and English-to-German datasets
shows the effectiveness of the method against strong encoder-decoder baselines
trained with or without weight tying.Comment: To appear at WMT 201
A Novel Perspective to Zero-shot Learning: Towards an Alignment of Manifold Structures via Semantic Feature Expansion
Zero-shot learning aims at recognizing unseen classes (no training example)
with knowledge transferred from seen classes. This is typically achieved by
exploiting a semantic feature space shared by both seen and unseen classes,
i.e., attribute or word vector, as the bridge. One common practice in zero-shot
learning is to train a projection between the visual and semantic feature
spaces with labeled seen classes examples. When inferring, this learned
projection is applied to unseen classes and recognizes the class labels by some
metrics. However, the visual and semantic feature spaces are mutually
independent and have quite different manifold structures. Under such a
paradigm, most existing methods easily suffer from the domain shift problem and
weaken the performance of zero-shot recognition. To address this issue, we
propose a novel model called AMS-SFE. It considers the alignment of manifold
structures by semantic feature expansion. Specifically, we build upon an
autoencoder-based model to expand the semantic features from the visual inputs.
Additionally, the expansion is jointly guided by an embedded manifold extracted
from the visual feature space of the data. Our model is the first attempt to
align both feature spaces by expanding semantic features and derives two
benefits: first, we expand some auxiliary features that enhance the semantic
feature space; second and more importantly, we implicitly align the manifold
structures between the visual and semantic feature spaces; thus, the projection
can be better trained and mitigate the domain shift problem. Extensive
experiments show significant performance improvement, which verifies the
effectiveness of our model
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Zero-Shot Kernel Learning
In this paper, we address an open problem of zero-shot learning. Its
principle is based on learning a mapping that associates feature vectors
extracted from i.e. images and attribute vectors that describe objects and/or
scenes of interest. In turns, this allows classifying unseen object classes
and/or scenes by matching feature vectors via mapping to a newly defined
attribute vector describing a new class. Due to importance of such a learning
task, there exist many methods that learn semantic, probabilistic, linear or
piece-wise linear mappings. In contrast, we apply well-established kernel
methods to learn a non-linear mapping between the feature and attribute spaces.
We propose an easy learning objective inspired by the Linear Discriminant
Analysis, Kernel-Target Alignment and Kernel Polarization methods that promotes
incoherence. We evaluate performance of our algorithm on the Polynomial as well
as shift-invariant Gaussian and Cauchy kernels. Despite simplicity of our
approach, we obtain state-of-the-art results on several zero-shot learning
datasets and benchmarks including a recent AWA2 dataset.Comment: IEEE Conference on Computer Vision and Pattern Recognition 201
Zero-Shot Object Detection
We introduce and tackle the problem of zero-shot object detection (ZSD),
which aims to detect object classes which are not observed during training. We
work with a challenging set of object classes, not restricting ourselves to
similar and/or fine-grained categories as in prior works on zero-shot
classification. We present a principled approach by first adapting
visual-semantic embeddings for ZSD. We then discuss the problems associated
with selecting a background class and motivate two background-aware approaches
for learning robust detectors. One of these models uses a fixed background
class and the other is based on iterative latent assignments. We also outline
the challenge associated with using a limited number of training classes and
propose a solution based on dense sampling of the semantic label space using
auxiliary data with a large number of categories. We propose novel splits of
two standard detection datasets - MSCOCO and VisualGenome, and present
extensive empirical results in both the traditional and generalized zero-shot
settings to highlight the benefits of the proposed methods. We provide useful
insights into the algorithm and conclude by posing some open questions to
encourage further research.Comment: 17 pages. ECCV 201
Leveraging Semantic Embeddings for Safety-Critical Applications
Semantic Embeddings are a popular way to represent knowledge in the field of
zero-shot learning. We observe their interpretability and discuss their
potential utility in a safety-critical context. Concretely, we propose to use
them to add introspection and error detection capabilities to neural network
classifiers. First, we show how to create embeddings from symbolic domain
knowledge. We discuss how to use them for interpreting mispredictions and
propose a simple error detection scheme. We then introduce the concept of
semantic distance: a real-valued score that measures confidence in the semantic
space. We evaluate this score on a traffic sign classifier and find that it
achieves near state-of-the-art performance, while being significantly faster to
compute than other confidence scores. Our approach requires no changes to the
original network and is thus applicable to any task for which domain knowledge
is available.Comment: Accepted at CVPR 2019 Workshop: Safe Artificial Intelligence for
Automated Drivin
- …