3,006 research outputs found
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Recognizing Bengali Word Images - A Zero-Shot Learning Perspective
Zero-Shot Learning(ZSL) techniques could classify a completely unseen class, which it has never seen before during training. Thus, making it more apt for any real-life classification problem, where it is not possible to train a system with annotated data for all possible class types. This work investigates recognition of word images written in Bengali Script in a ZSL framework. The proposed approach performs Zero-Shot word recognition by coupling deep learned features procured from various CNN architectures along with 13 basic shapes/stroke primitives commonly observed in Bengali script characters. As per the notion of ZSL framework those 13 basic shapes are termed as “Signature/Semantic Attributes”. The obtained results are promising while evaluation was carried out in a Five-Fold cross-validation setup dealing with samples from 250 word classes
Rebalanced Zero-shot Learning
Zero-shot learning (ZSL) aims to identify unseen classes with zero samples
during training. Broadly speaking, present ZSL methods usually adopt
class-level semantic labels and compare them with instance-level semantic
predictions to infer unseen classes. However, we find that such existing models
mostly produce imbalanced semantic predictions, i.e. these models could perform
precisely for some semantics, but may not for others. To address the drawback,
we aim to introduce an imbalanced learning framework into ZSL. However, we find
that imbalanced ZSL has two unique challenges: (1) Its imbalanced predictions
are highly correlated with the value of semantic labels rather than the number
of samples as typically considered in the traditional imbalanced learning; (2)
Different semantics follow quite different error distributions between classes.
To mitigate these issues, we first formalize ZSL as an imbalanced regression
problem which offers empirical evidences to interpret how semantic labels lead
to imbalanced semantic predictions. We then propose a re-weighted loss termed
Re-balanced Mean-Squared Error (ReMSE), which tracks the mean and variance of
error distributions, thus ensuring rebalanced learning across classes. As a
major contribution, we conduct a series of analyses showing that ReMSE is
theoretically well established. Extensive experiments demonstrate that the
proposed method effectively alleviates the imbalance in semantic prediction and
outperforms many state-of-the-art ZSL methods. Our code is available at
https://github.com/FouriYe/ReZSL-TIP23.Comment: Accepted to IEEE Transactions on Image Processing (TIP) 202
- …