122 research outputs found
Learning Visually Consistent Label Embeddings for Zero-Shot Learning
In this work, we propose a zero-shot learning method to effectively model
knowledge transfer between classes via jointly learning visually consistent
word vectors and label embedding model in an end-to-end manner. The main idea
is to project the vector space word vectors of attributes and classes into the
visual space such that word representations of semantically related classes
become more closer, and use the projected vectors in the proposed embedding
model to identify unseen classes. We evaluate the proposed approach on two
benchmark datasets and the experimental results show that our method yields
significant improvements in recognition accuracy.Comment: To appear at IEEE Int. Conference on Image Processing (ICIP) 201
Prediction of Progression to Alzheimer's disease with Deep InfoMax
Arguably, unsupervised learning plays a crucial role in the majority of
algorithms for processing brain imaging. A recently introduced unsupervised
approach Deep InfoMax (DIM) is a promising tool for exploring brain structure
in a flexible non-linear way. In this paper, we investigate the use of variants
of DIM in a setting of progression to Alzheimer's disease in comparison with
supervised AlexNet and ResNet inspired convolutional neural networks. As a
benchmark, we use a classification task between four groups: patients with
stable, and progressive mild cognitive impairment (MCI), with Alzheimer's
disease, and healthy controls. Our dataset is comprised of 828 subjects from
the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our
experiments highlight encouraging evidence of the high potential utility of DIM
in future neuroimaging studies.Comment: Accepted to 2019 IEEE Biomedical and Health Informatics (BHI) as a
conference pape
Learning Clusterable Visual Features for Zero-Shot Recognition
In zero-shot learning (ZSL), conditional generators have been widely used to
generate additional training features. These features can then be used to train
the classifiers for testing data. However, some testing data are considered
"hard" as they lie close to the decision boundaries and are prone to
misclassification, leading to performance degradation for ZSL. In this paper,
we propose to learn clusterable features for ZSL problems. Using a Conditional
Variational Autoencoder (CVAE) as the feature generator, we project the
original features to a new feature space supervised by an auxiliary
classification loss. To further increase clusterability, we fine-tune the
features using Gaussian similarity loss. The clusterable visual features are
not only more suitable for CVAE reconstruction but are also more separable
which improves classification accuracy. Moreover, we introduce Gaussian noise
to enlarge the intra-class variance of the generated features, which helps to
improve the classifier's robustness. Our experiments on SUN,CUB, and AWA2
datasets show consistent improvement over previous state-of-the-art ZSL results
by a large margin. In addition to its effectiveness on zero-shot
classification, experiments show that our method to increase feature
clusterability benefits few-shot learning algorithms as well
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Zero-Shot Learning (ZSL) in video classification is a promising research
direction, which aims to tackle the challenge from explosive growth of video
categories. Most existing methods exploit seen-to-unseen correlation via
learning a projection between visual and semantic spaces. However, such
projection-based paradigms cannot fully utilize the discriminative information
implied in data distribution, and commonly suffer from the information
degradation issue caused by "heterogeneity gap". In this paper, we propose a
visual data synthesis framework via GAN to address these problems.
Specifically, both semantic knowledge and visual distribution are leveraged to
synthesize video feature of unseen categories, and ZSL can be turned into
typical supervised problem with the synthetic features. First, we propose
multi-level semantic inference to boost video feature synthesis, which captures
the discriminative information implied in joint visual-semantic distribution
via feature-level and label-level semantic inference. Second, we propose
Matching-aware Mutual Information Correlation to overcome information
degradation issue, which captures seen-to-unseen correlation in matched and
mismatched visual-semantic pairs by mutual information, providing the zero-shot
synthesis procedure with robust guidance signals. Experimental results on four
video datasets demonstrate that our approach can improve the zero-shot video
classification performance significantly.Comment: 7 pages, accepted by International Joint Conference on Artificial
Intelligence (IJCAI) 201
Transductive Zero-Shot Learning with Visual Structure Constraint
To recognize objects of the unseen classes, most existing Zero-Shot
Learning(ZSL) methods first learn a compatible projection function between the
common semantic space and the visual space based on the data of source seen
classes, then directly apply it to the target unseen classes. However, in real
scenarios, the data distribution between the source and target domain might not
match well, thus causing the well-known \textbf{domain shift} problem. Based on
the observation that visual features of test instances can be separated into
different clusters, we propose a new visual structure constraint on class
centers for transductive ZSL, to improve the generality of the projection
function (i.e. alleviate the above domain shift problem). Specifically, three
different strategies (symmetric Chamfer-distance, Bipartite matching distance,
and Wasserstein distance) are adopted to align the projected unseen semantic
centers and visual cluster centers of test instances. We also propose a new
training strategy to handle the real cases where many unrelated images exist in
the test dataset, which is not considered in previous methods. Experiments on
many widely used datasets demonstrate that the proposed visual structure
constraint can bring substantial performance gain consistently and achieve
state-of-the-art results. The source code is available at
\url{https://github.com/raywzy/VSC}.Comment: NeurIPS 2019, code available at https://github.com/raywzy/VS
Semantically Aligned Bias Reducing Zero Shot Learning
Zero shot learning (ZSL) aims to recognize unseen classes by exploiting
semantic relationships between seen and unseen classes. Two major problems
faced by ZSL algorithms are the hubness problem and the bias towards the seen
classes. Existing ZSL methods focus on only one of these problems in the
conventional and generalized ZSL setting. In this work, we propose a novel
approach, Semantically Aligned Bias Reducing (SABR) ZSL, which focuses on
solving both the problems. It overcomes the hubness problem by learning a
latent space that preserves the semantic relationship between the labels while
encoding the discriminating information about the classes. Further, we also
propose ways to reduce the bias of the seen classes through a simple
cross-validation process in the inductive setting and a novel weak transfer
constraint in the transductive setting. Extensive experiments on three
benchmark datasets suggest that the proposed model significantly outperforms
existing state-of-the-art algorithms by ~1.5-9% in the conventional ZSL setting
and by ~2-14% in the generalized ZSL for both the inductive and transductive
settings.Comment: Published at the Conference on Computer Vision and Pattern
Recognition (CVPR 2019
Improving Generalized Zero-Shot Learning by Semantic Discriminator
It is a recognized fact that the classification accuracy of unseen classes in
the setting of Generalized Zero-Shot Learning (GZSL) is much lower than that of
traditional Zero-Shot Leaning (ZSL). One of the reasons is that an instance is
always misclassified to the wrong domain. Here we refer to the seen and unseen
classes as two domains respectively. We propose a new approach to distinguish
whether the instances come from the seen or unseen classes. First the visual
feature of instance is projected into the semantic space. Then the absolute
norm difference between the projected semantic vector and the class semantic
embedding vector, and the minimum distance between the projected semantic
vectors and the semantic embedding vectors of the seen classes are used as
discrimination basis. This approach is termed as SD (Semantic Discriminator)
because domain judgement of instance is performed in the semantic space. Our
approach can be combined with any existing ZSL method and fully supervision
classification model to form a new GZSL method. Furthermore, our approach is
very simple and does not need any fixed parameters
Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
Zero-shot learning extends the conventional object classification to the
unseen class recognition by introducing semantic representations of classes.
Existing approaches predominantly focus on learning the proper mapping function
for visual-semantic embedding, while neglecting the effect of learning
discriminative visual features. In this paper, we study the significance of the
discriminative region localization. We propose a semantic-guided
multi-attention localization model, which automatically discovers the most
discriminative parts of objects for zero-shot learning without any human
annotations. Our model jointly learns cooperative global and local features
from the whole object as well as the detected parts to categorize objects based
on semantic descriptions. Moreover, with the joint supervision of embedding
softmax loss and class-center triplet loss, the model is encouraged to learn
features with high inter-class dispersion and intra-class compactness. Through
comprehensive experiments on three widely used zero-shot learning benchmarks,
we show the efficacy of the multi-attention localization and our proposed
approach improves the state-of-the-art results by a considerable margin.Comment: accepted to NeurIPS'1
Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Zero-shot learning (ZSL) aims to recognize objects of novel classes without
any training samples of specific classes, which is achieved by exploiting the
semantic information and auxiliary datasets. Recently most ZSL approaches focus
on learning visual-semantic embeddings to transfer knowledge from the auxiliary
datasets to the novel classes. However, few works study whether the semantic
information is discriminative or not for the recognition task. To tackle such
problem, we propose a coupled dictionary learning approach to align the
visual-semantic structures using the class prototypes, where the discriminative
information lying in the visual space is utilized to improve the less
discriminative semantic space. Then, zero-shot recognition can be performed in
different spaces by the simple nearest neighbor approach using the learned
class prototypes. Extensive experiments on four benchmark datasets show the
effectiveness of the proposed approach.Comment: To appear in ECCV 201
Few-Shot Adaptation for Multimedia Semantic Indexing
We propose a few-shot adaptation framework, which bridges zero-shot learning
and supervised many-shot learning, for semantic indexing of image and video
data. Few-shot adaptation provides robust parameter estimation with few
training examples, by optimizing the parameters of zero-shot learning and
supervised many-shot learning simultaneously. In this method, first we build a
zero-shot detector, and then update it by using the few examples. Our
experiments show the effectiveness of the proposed framework on three datasets:
TRECVID Semantic Indexing 2010, 2014, and ImageNET. On the ImageNET dataset, we
show that our method outperforms recent few-shot learning methods. On the
TRECVID 2014 dataset, we achieve 15.19% and 35.98% in Mean Average Precision
under the zero-shot condition and the supervised condition, respectively. To
the best of our knowledge, these are the best results on this dataset
- …