233 research outputs found
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network
Few-shot semantic segmentation (FSS) aims to achieve novel objects
segmentation with only a few annotated samples and has made great progress
recently. Most of the existing FSS models focus on the feature matching between
support and query to tackle FSS. However, the appearance variations between
objects from the same category could be extremely large, leading to unreliable
feature matching and query mask prediction. To this end, we propose a
Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate
latent context structure in query images. Specifically, we propose a
Support-induced Graph Reasoning (SiGR) module to capture salient query object
parts at different semantic levels with a Support-induced GCN. Furthermore, an
instance association (IA) module is designed to capture high-order instance
context from both support and query instances. By integrating the proposed two
modules, SiGCN can learn rich query context representation, and thus being more
robust to appearance variations. Extensive experiments on PASCAL-5i and
COCO-20i demonstrate that our SiGCN achieves state-of-the-art performance.Comment: Accepted in BMVC2022 as oral presentatio
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
Recent leading zero-shot video object segmentation (ZVOS) works devote to
integrating appearance and motion information by elaborately designing feature
fusion modules and identically applying them in multiple feature stages. Our
preliminary experiments show that with the strong long-range dependency
modeling capacity of Transformer, simply concatenating the two modality
features and feeding them to vanilla Transformers for feature fusion can
distinctly benefit the performance but at a cost of heavy computation. Through
further empirical analysis, we find that attention dependencies learned in
Transformer in different stages exhibit completely different properties: global
query-independent dependency in the low-level stages and semantic-specific
dependency in the high-level stages. Motivated by the observations, we propose
two Transformer variants: i) Context-Sharing Transformer (CST) that learns the
global-shared contextual information within image frames with a lightweight
computation. ii) Semantic Gathering-Scattering Transformer (SGST) that models
the semantic correlation separately for the foreground and background and
reduces the computation cost with a soft token merging mechanism. We apply CST
and SGST for low-level and high-level feature fusions, respectively,
formulating a level-isomerous Transformer framework for ZVOS task. Compared
with the baseline that uses vanilla Transformers for multi-stage fusion, ours
significantly increase the speed by 13 times and achieves new state-of-the-art
ZVOS performance. Code is available at https://github.com/DLUT-yyc/Isomer.Comment: ICCV202
- …