1,398 research outputs found
Fine-Grained Image Analysis with Deep Learning: A Survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem
in computer vision and pattern recognition, and underpins a diverse set of
real-world applications. The task of FGIA targets analyzing visual objects from
subordinate categories, e.g., species of birds or models of cars. The small
inter-class and large intra-class variation inherent to fine-grained image
analysis makes it a challenging problem. Capitalizing on advances in deep
learning, in recent years we have witnessed remarkable progress in deep
learning powered FGIA. In this paper we present a systematic survey of these
advances, where we attempt to re-define and broaden the field of FGIA by
consolidating two fundamental fine-grained research areas -- fine-grained image
recognition and fine-grained image retrieval. In addition, we also review other
key issues of FGIA, such as publicly available benchmark datasets and related
domain-specific applications. We conclude by highlighting several research
directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition
Recognizing novel sub-categories with scarce samples is an essential and
challenging research topic in computer vision. Existing literature addresses
this challenge by employing local-based representation approaches, which may
not sufficiently facilitate meaningful object-specific semantic understanding,
leading to a reliance on apparent background correlations. Moreover, they
primarily rely on high-dimensional local descriptors to construct complex
embedding space, potentially limiting the generalization. To address the above
challenges, this article proposes a novel model called RSaG for few-shot
fine-grained visual recognition. RSaG introduces additional saliency-aware
supervision via saliency detection to guide the model toward focusing on the
intrinsic discriminative regions. Specifically, RSaG utilizes the saliency
detection model to emphasize the critical regions of each sub-category,
providing additional object-specific information for fine-grained prediction.
RSaG transfers such information with two symmetric branches in a mutual
learning paradigm. Furthermore, RSaG exploits inter-regional relationships to
enhance the informativeness of the representation and subsequently summarize
the highlighted details into contextual embeddings to facilitate the effective
transfer, enabling quick generalization to novel sub-categories. The proposed
approach is empirically evaluated on three widely used benchmarks,
demonstrating its superior performance.Comment: Under Revie
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
- …