33 research outputs found
A Generative Model For Zero Shot Learning Using Conditional Variational Autoencoders
Zero shot learning in Image Classification refers to the setting where images
from some novel classes are absent in the training data but other information
such as natural language descriptions or attribute vectors of the classes are
available. This setting is important in the real world since one may not be
able to obtain images of all the possible classes at training. While previous
approaches have tried to model the relationship between the class attribute
space and the image space via some kind of a transfer function in order to
model the image space correspondingly to an unseen class, we take a different
approach and try to generate the samples from the given attributes, using a
conditional variational autoencoder, and use the generated samples for
classification of the unseen classes. By extensive testing on four benchmark
datasets, we show that our model outperforms the state of the art, particularly
in the more realistic generalized setting, where the training classes can also
appear at the test time along with the novel classes
Self-supervised embedding for generalized zero-shot learning in remote sensing scene classification
Generalized zero-shot learning (GZSL) is the most popular approach for developing ZSL, which involves both seen and unseen classes in the classification process. Many of the existing GZSL approaches for scene classification in remote sensing images use word embeddings that do not effectively describe unseen categories. We explore word embedding to describe the classes of remote sensing scenes to improve the classification accuracy of unseen categories. The proposed method uses a data2vec embedding based on self-supervised learning to obtain a continuous and contextualized latent representation. This representation leverages two advantages of the standard transformer architecture. First, targets are not predefined as visual tokens. Second, latent representations preserve contextual information. We conducted experiments on three benchmark scene classification datasets of remote sensing images. The proposed approach demonstrates its efficacy over the existing GZSL approaches.publishedVersio