3,525 research outputs found
SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization
Over the past few years, a significant progress has been made in deep
convolutional neural networks (CNNs)-based image recognition. This is mainly
due to the strong ability of such networks in mining discriminative object pose
and parts information from texture and shape. This is often inappropriate for
fine-grained visual classification (FGVC) since it exhibits high intra-class
and low inter-class variances due to occlusions, deformation, illuminations,
etc. Thus, an expressive feature representation describing global structural
information is a key to characterize an object/ scene. To this end, we propose
a method that effectively captures subtle changes by aggregating context-aware
features from most relevant image-regions and their importance in
discriminating fine-grained categories avoiding the bounding-box and/or
distinguishable part annotations. Our approach is inspired by the recent
advancement in self-attention and graph neural networks (GNNs) approaches to
include a simple yet effective relation-aware feature transformation and its
refinement using a context-aware attention mechanism to boost the
discriminability of the transformed feature in an end-to-end learning process.
Our model is evaluated on eight benchmark datasets consisting of fine-grained
objects and human-object interactions. It outperforms the state-of-the-art
approaches by a significant margin in recognition accuracy.Comment: Accepted manuscript - IEEE Transaction on Image Processin
SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization
Over the past few years, a significant progress has been made in deep
convolutional neural networks (CNNs)-based image recognition. This is mainly
due to the strong ability of such networks in mining discriminative object pose
and parts information from texture and shape. This is often inappropriate for
fine-grained visual classification (FGVC) since it exhibits high intra-class
and low inter-class variances due to occlusions, deformation, illuminations,
etc. Thus, an expressive feature representation describing global structural
information is a key to characterize an object/ scene. To this end, we propose
a method that effectively captures subtle changes by aggregating context-aware
features from most relevant image-regions and their importance in
discriminating fine-grained categories avoiding the bounding-box and/or
distinguishable part annotations. Our approach is inspired by the recent
advancement in self-attention and graph neural networks (GNNs) approaches to
include a simple yet effective relation-aware feature transformation and its
refinement using a context-aware attention mechanism to boost the
discriminability of the transformed feature in an end-to-end learning process.
Our model is evaluated on eight benchmark datasets consisting of fine-grained
objects and human-object interactions. It outperforms the state-of-the-art
approaches by a significant margin in recognition accuracy.Comment: Accepted manuscript - IEEE Transaction on Image Processin
Part-guided Relational Transformers for Fine-grained Visual Recognition
Fine-grained visual recognition is to classify objects with visually similar
appearances into subcategories, which has made great progress with the
development of deep CNNs. However, handling subtle differences between
different subcategories still remains a challenge. In this paper, we propose to
solve this issue in one unified framework from two aspects, i.e., constructing
feature-level interrelationships, and capturing part-level discriminative
features. This framework, namely PArt-guided Relational Transformers (PART), is
proposed to learn the discriminative part features with an automatic part
discovery module, and to explore the intrinsic correlations with a feature
transformation module by adapting the Transformer models from the field of
natural language processing. The part discovery module efficiently discovers
the discriminative regions which are highly-corresponded to the gradient
descent procedure. Then the second feature transformation module builds
correlations within the global embedding and multiple part embedding, enhancing
spatial interactions among semantic pixels. Moreover, our proposed approach
does not rely on additional part branches in the inference time and reaches
state-of-the-art performance on 3 widely-used fine-grained object recognition
benchmarks. Experimental results and explainable visualizations demonstrate the
effectiveness of our proposed approach. The code can be found at
https://github.com/iCVTEAM/PART.Comment: Published in IEEE TIP 202
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
- …