4,383 research outputs found
Subset Feature Learning for Fine-Grained Category Classification
Fine-grained categorisation has been a challenging problem due to small
inter-class variation, large intra-class variation and low number of training
images. We propose a learning system which first clusters visually similar
classes and then learns deep convolutional neural network features specific to
each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset
show that the proposed method outperforms recent fine-grained categorisation
methods under the most difficult setting: no bounding boxes are presented at
test time. It achieves a mean accuracy of 77.5%, compared to the previous best
performance of 73.2%. We also show that progressive transfer learning allows us
to first learn domain-generic features (for bird classification) which can then
be adapted to specific set of bird classes, yielding improvements in accuracy
Deep Learning Based Fine Grained Image Classification
Image classification, specifically object classification is the focused research area in the computer vision and machine learning field in the past decade. In image classification a label or category is assigned to an input image based on its content. With breakthroughs in deep learning-based approaches, performance of image classification models' has improved significantly, particularly fine-grained image classification, which includes discriminating between items of the same category with slight changes. The object classification can be categorised as coarse grained object classification, which identifies highly diverse object categories, such as an elephant and a bus. One example of this type of object classification is a bus and an elephant. On the other hand, fine-grained image categorization seeks to recognise photos as belonging to distinct species of animals, birds, or plants, as well as distinct models of automobiles, versions of aircraft, and so on. The purpose of this study is to evaluate previously published research that investigates deep learning techniques for the classification of fine-grained images and to compare the effectiveness of these techniques using datasets that are open to the public
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification
Deep convolutional neural networks (CNNs) have shown a strong ability in
mining discriminative object pose and parts information for image recognition.
For fine-grained recognition, context-aware rich feature representation of
object/scene plays a key role since it exhibits a significant variance in the
same subcategory and subtle variance among different subcategories. Finding the
subtle variance that fully characterizes the object/scene is not
straightforward. To address this, we propose a novel context-aware attentional
pooling (CAP) that effectively captures subtle changes via sub-pixel gradients,
and learns to attend informative integral regions and their importance in
discriminating different subcategories without requiring the bounding-box
and/or distinguishable part annotations. We also introduce a novel feature
encoding by considering the intrinsic consistency between the informativeness
of the integral regions and their spatial structures to capture the semantic
correlation among them. Our approach is simple yet extremely effective and can
be easily applied on top of a standard classification backbone network. We
evaluate our approach using six state-of-the-art (SotA) backbone networks and
eight benchmark datasets. Our method significantly outperforms the SotA
approaches on six datasets and is very competitive with the remaining two.Comment: Extended version of the accepted paper in 35th AAAI Conference on
Artificial Intelligence 202
Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks
This work addresses the problem of vehicle identification through
non-overlapping cameras. As our main contribution, we introduce a novel dataset
for vehicle identification, called Vehicle-Rear, that contains more than three
hours of high-resolution videos, with accurate information about the make,
model, color and year of nearly 3,000 vehicles, in addition to the position and
identification of their license plates. To explore our dataset we design a
two-stream CNN that simultaneously uses two of the most distinctive and
persistent features available: the vehicle's appearance and its license plate.
This is an attempt to tackle a major problem: false alarms caused by vehicles
with similar designs or by very close license plate identifiers. In the first
network stream, shape similarities are identified by a Siamese CNN that uses a
pair of low-resolution vehicle patches recorded by two different cameras. In
the second stream, we use a CNN for OCR to extract textual information,
confidence scores, and string similarities from a pair of high-resolution
license plate patches. Then, features from both streams are merged by a
sequence of fully connected layers for decision. In our experiments, we
compared the two-stream network against several well-known CNN architectures
using single or multiple vehicle features. The architectures, trained models,
and dataset are publicly available at https://github.com/icarofua/vehicle-rear
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
- …