769 research outputs found
Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning
Zero-shot learning (ZSL) addresses the unseen class recognition problem by
leveraging semantic information to transfer knowledge from seen classes to
unseen classes. Generative models synthesize the unseen visual features and
convert ZSL into a classical supervised learning problem. These generative
models are trained using the seen classes and are expected to implicitly
transfer the knowledge from seen to unseen classes. However, their performance
is stymied by overfitting, which leads to substandard performance on
Generalized Zero-Shot learning (GZSL). To address this concern, we propose the
novel LsrGAN, a generative model that Leverages the Semantic Relationship
between seen and unseen categories and explicitly performs knowledge transfer
by incorporating a novel Semantic Regularized Loss (SR-Loss). The SR-loss
guides the LsrGAN to generate visual features that mirror the semantic
relationships between seen and unseen classes. Experiments on seven benchmark
datasets, including the challenging Wikipedia text-based CUB and NABirds
splits, and Attribute-based AWA, CUB, and SUN, demonstrates the superiority of
the LsrGAN compared to previous state-of-the-art approaches under both ZSL and
GZSL. Code is available at https: // github. com/ Maunil/ LsrGANComment: 19 Pages, To be appear in ECCV 202
Generalized Zero-Shot Learning via Synthesized Examples
We present a generative framework for generalized zero-shot learning where
the training and test classes are not necessarily disjoint. Built upon a
variational autoencoder based architecture, consisting of a probabilistic
encoder and a probabilistic conditional decoder, our model can generate novel
exemplars from seen/unseen classes, given their respective class attributes.
These exemplars can subsequently be used to train any off-the-shelf
classification model. One of the key aspects of our encoder-decoder
architecture is a feedback-driven mechanism in which a discriminator (a
multivariate regressor) learns to map the generated exemplars to the
corresponding class attribute vectors, leading to an improved generator. Our
model's ability to generate and leverage examples from unseen classes to train
the classification model naturally helps to mitigate the bias towards
predicting seen classes in generalized zero-shot learning settings. Through a
comprehensive set of experiments, we show that our model outperforms several
state-of-the-art methods, on several benchmark datasets, for both standard as
well as generalized zero-shot learning.Comment: Accepted in CVPR'1
Mitigating Generation Shifts for Generalized Zero-Shot Learning
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic
information (e.g., attributes) to recognize the seen and unseen samples, where
unseen classes are not observable during training. It is natural to derive
generative models and hallucinate training samples for unseen classes based on
the knowledge learned from the seen samples. However, most of these models
suffer from the `generation shifts', where the synthesized samples may drift
from the real distribution of unseen data. In this paper, we conduct an
in-depth analysis on this issue and propose a novel Generation Shifts
Mitigating Flow (GSMFlow) framework, which is comprised of multiple conditional
affine coupling layers for learning unseen data synthesis efficiently and
effectively. In particular, we identify three potential problems that trigger
the generation shifts, i.e., semantic inconsistency, variance decay, and
structural permutation and address them respectively. First, to reinforce the
correlations between the generated samples and the respective attributes, we
explicitly embed the semantic information into the transformations in each of
the coupling layers. Second, to recover the intrinsic variance of the
synthesized unseen features, we introduce a visual perturbation strategy to
diversify the intra-class variance of generated data and hereby help adjust the
decision boundary of the classifier. Third, to avoid structural permutation in
the semantic space, we propose a relative positioning strategy to manipulate
the attribute embeddings, guiding which to fully preserve the inter-class
geometric structure. Experimental results demonstrate that GSMFlow achieves
state-of-the-art recognition performance in both conventional and generalized
zero-shot settings. Our code is available at:
https://github.com/uqzhichen/GSMFlowComment: ACM Multimedia 202
Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning
Generalized zero-shot learning(GZSL) aims to classify samples from seen and
unseen labels, assuming unseen labels are not accessible during training.
Recent advancements in GZSL have been expedited by incorporating
contrastive-learning-based (instance-based) embedding in generative networks
and leveraging the semantic relationship between data points. However, existing
embedding architectures suffer from two limitations: (1) limited
discriminability of synthetic features' embedding without considering
fine-grained cluster structures; (2) inflexible optimization due to restricted
scaling mechanisms on existing contrastive embedding networks, leading to
overlapped representations in the embedding space. To enhance the quality of
representations in the embedding space, as mentioned in (1), we propose a
margin-based prototypical contrastive learning embedding network that reaps the
benefits of prototype-data (cluster quality enhancement) and implicit data-data
(fine-grained representations) interaction while providing substantial cluster
supervision to the embedding network and the generator. To tackle (2), we
propose an instance adaptive contrastive loss that leads to generalized
representations for unseen labels with increased inter-class margin. Through
comprehensive experimental evaluation, we show that our method can outperform
the current state-of-the-art on three benchmark datasets. Our approach also
consistently achieves the best unseen performance in the GZSL setting.Comment: 7 pages, 4 figures. Accepted in IJCAI 2023 Workshop on Generalizing
from Limited Resources in the Open Worl
Adaptive Cross-Modal Few-Shot Learning
Metric-based meta-learning techniques have successfully been applied to
few-shot classification problems. In this paper, we propose to leverage
cross-modal information to enhance metric-based few-shot learning methods.
Visual and semantic feature spaces have different structures by definition. For
certain concepts, visual features might be richer and more discriminative than
text ones. While for others, the inverse might be true. Moreover, when the
support from visual information is limited in image classification, semantic
representations (learned from unsupervised text corpora) can provide strong
prior knowledge and context to help learning. Based on these two intuitions, we
propose a mechanism that can adaptively combine information from both
modalities according to new image categories to be learned. Through a series of
experiments, we show that by this adaptive combination of the two modalities,
our model outperforms current uni-modality few-shot learning methods and
modality-alignment methods by a large margin on all benchmarks and few-shot
scenarios tested. Experiments also show that our model can effectively adjust
its focus on the two modalities. The improvement in performance is particularly
large when the number of shots is very small
OntoZSL: Ontology-enhanced Zero-shot Learning
Zero-shot Learning (ZSL), which aims to predict for those classes that have
never appeared in the training data, has arisen hot research interests. The key
of implementing ZSL is to leverage the prior knowledge of classes which builds
the semantic relationship between classes and enables the transfer of the
learned models (e.g., features) from training classes (i.e., seen classes) to
unseen classes. However, the priors adopted by the existing methods are
relatively limited with incomplete semantics. In this paper, we explore richer
and more competitive prior knowledge to model the inter-class relationship for
ZSL via ontology-based knowledge representation and semantic embedding.
Meanwhile, to address the data imbalance between seen classes and unseen
classes, we developed a generative ZSL framework with Generative Adversarial
Networks (GANs). Our main findings include: (i) an ontology-enhanced ZSL
framework that can be applied to different domains, such as image
classification (IMGC) and knowledge graph completion (KGC); (ii) a
comprehensive evaluation with multiple zero-shot datasets from different
domains, where our method often achieves better performance than the
state-of-the-art models. In particular, on four representative ZSL baselines of
IMGC, the ontology-based class semantics outperform the previous priors e.g.,
the word embeddings of classes by an average of 12.4 accuracy points in the
standard ZSL across two example datasets (see Figure 4).Comment: Accepted to The Web Conference (WWW) 202
ChatGPT-guided Semantics for Zero-shot Learning
Zero-shot learning (ZSL) aims to classify objects that are not observed or
seen during training. It relies on class semantic description to transfer
knowledge from the seen classes to the unseen classes. Existing methods of
obtaining class semantics include manual attributes or automatic word vectors
from language models (like word2vec). We know attribute annotation is costly,
whereas automatic word-vectors are relatively noisy. To address this problem,
we explore how ChatGPT, a large language model, can enhance class semantics for
ZSL tasks. ChatGPT can be a helpful source to obtain text descriptions for each
class containing related attributes and semantics. We use the word2vec model to
get a word vector using the texts from ChatGPT. Then, we enrich word vectors by
combining the word embeddings from class names and descriptions generated by
ChatGPT. More specifically, we leverage ChatGPT to provide extra supervision
for the class description, eventually benefiting ZSL models. We evaluate our
approach on various 2D image (CUB and AwA) and 3D point cloud (ModelNet10,
ModelNet40, and ScanObjectNN) datasets and show that it improves ZSL
performance. Our work contributes to the ZSL literature by applying ChatGPT for
class semantics enhancement and proposing a novel word vector fusion method.Comment: Accepted in International Conference on Digital Image Computing:
Techniques and Applications (DICTA), 202
Invertible Zero-Shot Recognition Flows
Deep generative models have been successfully applied to Zero-Shot Learning
(ZSL) recently. However, the underlying drawbacks of GANs and VAEs (e.g., the
hardness of training with ZSL-oriented regularizers and the limited generation
quality) hinder the existing generative ZSL models from fully bypassing the
seen-unseen bias. To tackle the above limitations, for the first time, this
work incorporates a new family of generative models (i.e., flow-based models)
into ZSL. The proposed Invertible Zero-shot Flow (IZF) learns factorized data
embeddings (i.e., the semantic factors and the non-semantic ones) with the
forward pass of an invertible flow network, while the reverse pass generates
data samples. This procedure theoretically extends conventional generative
flows to a factorized conditional scheme. To explicitly solve the bias problem,
our model enlarges the seen-unseen distributional discrepancy based on negative
sample-based distance measurement. Notably, IZF works flexibly with either a
naive Bayesian classifier or a held-out trainable one for zero-shot
recognition. Experiments on widely-adopted ZSL benchmarks demonstrate the
significant performance gain of IZF over existing methods, in both classic and
generalized settings.Comment: ECCV202
See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data
Zero-shot point cloud segmentation aims to make deep models capable of
recognizing novel objects in point cloud that are unseen in the training phase.
Recent trends favor the pipeline which transfers knowledge from seen classes
with labels to unseen classes without labels. They typically align visual
features with semantic features obtained from word embedding by the supervision
of seen classes' annotations. However, point cloud contains limited information
to fully match with semantic features. In fact, the rich appearance information
of images is a natural complement to the textureless point cloud, which is not
well explored in previous literature. Motivated by this, we propose a novel
multi-modal zero-shot learning method to better utilize the complementary
information of point clouds and images for more accurate visual-semantic
alignment. Extensive experiments are performed in two popular benchmarks, i.e.,
SemanticKITTI and nuScenes, and our method outperforms current SOTA methods
with 52% and 49% improvement on average for unseen class mIoU, respectively.Comment: Accepted by ICCV 202
Deconfounding Causal Inference for Zero-shot Action Recognition
Zero-shot action recognition (ZSAR) aims to recognize unseen action categories in the test set without corresponding training examples. Most existing zero-shot methods follow the feature generation framework to transfer knowledge from seen action categories to model the feature distribution of unseen categories. However, due to the complexity and diversity of actions, it remains challenging to generate unseen feature distribution, especially for the cross-dataset scenario when there is potentially larger domain shift. This paper proposes a De confounding Ca usa l GAN (DeCalGAN) for generating unseen action video features with the following technical contributions: 1) Our model unifies compositional ZSAR with traditional visual-semantic models to incorporate local object information with global semantic information for feature generation. 2) A GAN-based architecture is proposed for causal inference and unseen distribution discovery. 3) A deconfounding module is proposed to refine representations of local object and global semantic information confounder in the training data. Action descriptions and random object feature after causal inference are then used to discover unseen distributions of novel actions in different datasets. Our extensive experiments on C ross- D ataset Z ero- S hot A ction R ecognition (CD-ZSAR) demonstrate substantial improvement over the UCF101 and HMDB51 standard benchmarks for this problem
- …