13 research outputs found
Learning Semantically Enhanced Feature for Fine-Grained Image Classification
We aim to provide a computationally cheap yet effective approach for
fine-grained image classification (FGIC) in this letter. Unlike previous
methods that rely on complex part localization modules, our approach learns
fine-grained features by enhancing the semantics of sub-features of a global
feature. Specifically, we first achieve the sub-feature semantic by arranging
feature channels of a CNN into different groups through channel permutation.
Meanwhile, to enhance the discriminability of sub-features, the groups are
guided to be activated on object parts with strong discriminability by a
weighted combination regularization. Our approach is parameter parsimonious and
can be easily integrated into the backbone model as a plug-and-play module for
end-to-end training with only image-level supervision. Experiments verified the
effectiveness of our approach and validated its comparable performance to the
state-of-the-art methods. Code is available at https://github.com/cswluo/SEFComment: Accepted by IEEE Signal Processing Letters. 5 pages, 4 figures, 4
table
Feature Fusion Vision Transformer for Fine-Grained Visual Categorization
The core for tackling the fine-grained visual categorization (FGVC) is to
learn subtle yet discriminative features. Most previous works achieve this by
explicitly selecting the discriminative parts or integrating the attention
mechanism via CNN-based approaches.However, these methods enhance the
computational complexity and make the modeldominated by the regions containing
the most of the objects. Recently, vision trans-former (ViT) has achieved SOTA
performance on general image recognition tasks. Theself-attention mechanism
aggregates and weights the information from all patches to the classification
token, making it perfectly suitable for FGVC. Nonetheless, the classifi-cation
token in the deep layer pays more attention to the global information, lacking
the local and low-level features that are essential for FGVC. In this work, we
proposea novel pure transformer-based framework Feature Fusion Vision
Transformer (FFVT)where we aggregate the important tokens from each transformer
layer to compensate thelocal, low-level and middle-level information. We design
a novel token selection mod-ule called mutual attention weight selection (MAWS)
to guide the network effectively and efficiently towards selecting
discriminative tokens without introducing extra param-eters. We verify the
effectiveness of FFVT on three benchmarks where FFVT achieves the
state-of-the-art performance.Comment: 9 pages, 2 figures, 3 table
Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment
Self-supervised contrastive learning has demonstrated great potential in
learning visual representations. Despite their success on various downstream
tasks such as image classification and object detection, self-supervised
pre-training for fine-grained scenarios is not fully explored. In this paper,
we first point out that current contrastive methods are prone to memorizing
background/foreground texture and therefore have a limitation in localizing the
foreground object. Analysis suggests that learning to extract discriminative
texture information and localization are equally crucial for self-supervised
pre-training in fine-grained scenarios. Based on our findings, we introduce
cross-view saliency alignment (CVSA), a contrastive learning framework that
first crops and swaps saliency regions of images as a novel view generation and
then guides the model to localize on the foreground object via a cross-view
alignment loss. Extensive experiments on four popular fine-grained
classification benchmarks show that CVSA significantly improves the learned
representation.Comment: The second version of CVSA. 10 pages, 4 figure
Deep Learning Based Fine Grained Image Classification
Image classification, specifically object classification is the focused research area in the computer vision and machine learning field in the past decade. In image classification a label or category is assigned to an input image based on its content. With breakthroughs in deep learning-based approaches, performance of image classification models' has improved significantly, particularly fine-grained image classification, which includes discriminating between items of the same category with slight changes. The object classification can be categorised as coarse grained object classification, which identifies highly diverse object categories, such as an elephant and a bus. One example of this type of object classification is a bus and an elephant. On the other hand, fine-grained image categorization seeks to recognise photos as belonging to distinct species of animals, birds, or plants, as well as distinct models of automobiles, versions of aircraft, and so on. The purpose of this study is to evaluate previously published research that investigates deep learning techniques for the classification of fine-grained images and to compare the effectiveness of these techniques using datasets that are open to the public