43,523 research outputs found
PDiscoNet: Semantically consistent part discovery for fine-grained recognition
Fine-grained classification often requires recognizing specific object parts,
such as beak shape and wing patterns for birds. Encouraging a fine-grained
classification model to first detect such parts and then using them to infer
the class could help us gauge whether the model is indeed looking at the right
details better than with interpretability methods that provide a single
attribution map. We propose PDiscoNet to discover object parts by using only
image-level class labels along with priors encouraging the parts to be:
discriminative, compact, distinct from each other, equivariant to rigid
transforms, and active in at least some of the images. In addition to using the
appropriate losses to encode these priors, we propose to use part-dropout,
where full part feature vectors are dropped at once to prevent a single part
from dominating in the classification, and part feature vector modulation,
which makes the information coming from each part distinct from the perspective
of the classifier. Our results on CUB, CelebA, and PartImageNet show that the
proposed method provides substantially better part discovery performance than
previous methods while not requiring any additional hyper-parameter tuning and
without penalizing the classification performance. The code is available at
https://github.com/robertdvdk/part_detection.Comment: 9 pages, 8 figures, ICC
Learning Semantically Enhanced Feature for Fine-Grained Image Classification
We aim to provide a computationally cheap yet effective approach for
fine-grained image classification (FGIC) in this letter. Unlike previous
methods that rely on complex part localization modules, our approach learns
fine-grained features by enhancing the semantics of sub-features of a global
feature. Specifically, we first achieve the sub-feature semantic by arranging
feature channels of a CNN into different groups through channel permutation.
Meanwhile, to enhance the discriminability of sub-features, the groups are
guided to be activated on object parts with strong discriminability by a
weighted combination regularization. Our approach is parameter parsimonious and
can be easily integrated into the backbone model as a plug-and-play module for
end-to-end training with only image-level supervision. Experiments verified the
effectiveness of our approach and validated its comparable performance to the
state-of-the-art methods. Code is available at https://github.com/cswluo/SEFComment: Accepted by IEEE Signal Processing Letters. 5 pages, 4 figures, 4
table
Fine-graind Image Classification via Combining Vision and Language
Fine-grained image classification is a challenging task due to the large
intra-class variance and small inter-class variance, aiming at recognizing
hundreds of sub-categories belonging to the same basic-level category. Most
existing fine-grained image classification methods generally learn part
detection models to obtain the semantic parts for better classification
accuracy. Despite achieving promising results, these methods mainly have two
limitations: (1) not all the parts which obtained through the part detection
models are beneficial and indispensable for classification, and (2)
fine-grained image classification requires more detailed visual descriptions
which could not be provided by the part locations or attribute annotations. For
addressing the above two limitations, this paper proposes the two-stream model
combining vision and language (CVL) for learning latent semantic
representations. The vision stream learns deep representations from the
original visual information via deep convolutional neural network. The language
stream utilizes the natural language descriptions which could point out the
discriminative parts or characteristics for each image, and provides a flexible
and compact way of encoding the salient visual aspects for distinguishing
sub-categories. Since the two streams are complementary, combining the two
streams can further achieves better classification accuracy. Comparing with 12
state-of-the-art methods on the widely used CUB-200-2011 dataset for
fine-grained image classification, the experimental results demonstrate our CVL
approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201
Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features
TThe goal of our work is to discover dominant objects in a very general
setting where only a single unlabeled image is given. This is far more
challenge than typical co-localization or weakly-supervised localization tasks.
To tackle this problem, we propose a simple but effective pattern mining-based
method, called Object Location Mining (OLM), which exploits the advantages of
data mining and feature representation of pre-trained convolutional neural
networks (CNNs). Specifically, we first convert the feature maps from a
pre-trained CNN model into a set of transactions, and then discovers frequent
patterns from transaction database through pattern mining techniques. We
observe that those discovered patterns, i.e., co-occurrence highlighted
regions, typically hold appearance and spatial consistency. Motivated by this
observation, we can easily discover and localize possible objects by merging
relevant meaningful patterns. Extensive experiments on a variety of benchmarks
demonstrate that OLM achieves competitive localization performance compared
with the state-of-the-art methods. We also evaluate our approach compared with
unsupervised saliency detection methods and achieves competitive results on
seven benchmark datasets. Moreover, we conduct experiments on fine-grained
classification to show that our proposed method can locate the entire object
and parts accurately, which can benefit to improving the classification results
significantly
- …