210 research outputs found
Fine-graind Image Classification via Combining Vision and Language
Fine-grained image classification is a challenging task due to the large
intra-class variance and small inter-class variance, aiming at recognizing
hundreds of sub-categories belonging to the same basic-level category. Most
existing fine-grained image classification methods generally learn part
detection models to obtain the semantic parts for better classification
accuracy. Despite achieving promising results, these methods mainly have two
limitations: (1) not all the parts which obtained through the part detection
models are beneficial and indispensable for classification, and (2)
fine-grained image classification requires more detailed visual descriptions
which could not be provided by the part locations or attribute annotations. For
addressing the above two limitations, this paper proposes the two-stream model
combining vision and language (CVL) for learning latent semantic
representations. The vision stream learns deep representations from the
original visual information via deep convolutional neural network. The language
stream utilizes the natural language descriptions which could point out the
discriminative parts or characteristics for each image, and provides a flexible
and compact way of encoding the salient visual aspects for distinguishing
sub-categories. Since the two streams are complementary, combining the two
streams can further achieves better classification accuracy. Comparing with 12
state-of-the-art methods on the widely used CUB-200-2011 dataset for
fine-grained image classification, the experimental results demonstrate our CVL
approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201
Modelling Local Deep Convolutional Neural Network Features to Improve Fine-Grained Image Classification
We propose a local modelling approach using deep convolutional neural
networks (CNNs) for fine-grained image classification. Recently, deep CNNs
trained from large datasets have considerably improved the performance of
object recognition. However, to date there has been limited work using these
deep CNNs as local feature extractors. This partly stems from CNNs having
internal representations which are high dimensional, thereby making such
representations difficult to model using stochastic models. To overcome this
issue, we propose to reduce the dimensionality of one of the internal fully
connected layers, in conjunction with layer-restricted retraining to avoid
retraining the entire network. The distribution of low-dimensional features
obtained from the modified layer is then modelled using a Gaussian mixture
model. Comparative experiments show that considerable performance improvements
can be achieved on the challenging Fish and UEC FOOD-100 datasets.Comment: 5 pages, three figure
Iterative Object and Part Transfer for Fine-Grained Recognition
The aim of fine-grained recognition is to identify sub-ordinate categories in
images like different species of birds. Existing works have confirmed that, in
order to capture the subtle differences across the categories, automatic
localization of objects and parts is critical. Most approaches for object and
part localization relied on the bottom-up pipeline, where thousands of region
proposals are generated and then filtered by pre-trained object/part models.
This is computationally expensive and not scalable once the number of
objects/parts becomes large. In this paper, we propose a nonparametric
data-driven method for object and part localization. Given an unlabeled test
image, our approach transfers annotations from a few similar images retrieved
in the training set. In particular, we propose an iterative transfer strategy
that gradually refine the predicted bounding boxes. Based on the located
objects and parts, deep convolutional features are extracted for recognition.
We evaluate our approach on the widely-used CUB200-2011 dataset and a new and
large dataset called Birdsnap. On both datasets, we achieve better results than
many state-of-the-art approaches, including a few using oracle (manually
annotated) bounding boxes in the test images.Comment: To appear in ICME 2017 as an oral pape
Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition
A key challenge in fine-grained recognition is how to find and represent
discriminative local regions. Recent attention models are capable of learning
discriminative region localizers only from category labels with reinforcement
learning. However, not utilizing any explicit part information, they are not
able to accurately find multiple distinctive regions. In this work, we
introduce an attribute-guided attention localization scheme where the local
region localizers are learned under the guidance of part attribute
descriptions. By designing a novel reward strategy, we are able to learn to
locate regions that are spatially and semantically distinctive with
reinforcement learning algorithm. The attribute labeling requirement of the
scheme is more amenable than the accurate part location annotation required by
traditional part-based fine-grained recognition methods. Experimental results
on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme
on both fine-grained recognition and attribute recognition
Subset Feature Learning for Fine-Grained Category Classification
Fine-grained categorisation has been a challenging problem due to small
inter-class variation, large intra-class variation and low number of training
images. We propose a learning system which first clusters visually similar
classes and then learns deep convolutional neural network features specific to
each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset
show that the proposed method outperforms recent fine-grained categorisation
methods under the most difficult setting: no bounding boxes are presented at
test time. It achieves a mean accuracy of 77.5%, compared to the previous best
performance of 73.2%. We also show that progressive transfer learning allows us
to first learn domain-generic features (for bird classification) which can then
be adapted to specific set of bird classes, yielding improvements in accuracy
- …