30 research outputs found
Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop
Existing fine-grained visual categorization methods often suffer from three
challenges: lack of training data, large number of fine-grained categories, and
high intraclass vs. low inter-class variance. In this work we propose a generic
iterative framework for fine-grained categorization and dataset bootstrapping
that handles these three challenges. Using deep metric learning with humans in
the loop, we learn a low dimensional feature embedding with anchor points on
manifolds for each category. These anchor points capture intra-class variances
and remain discriminative between classes. In each round, images with high
confidence scores from our model are sent to humans for labeling. By comparing
with exemplar images, labelers mark each candidate image as either a "true
positive" or a "false positive". True positives are added into our current
dataset and false positives are regarded as "hard negatives" for our metric
learning model. Then the model is retrained with an expanded dataset and hard
negatives for the next round. To demonstrate the effectiveness of the proposed
framework, we bootstrap a fine-grained flower dataset with 620 categories from
Instagram images. The proposed deep metric learning scheme is evaluated on both
our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show
significant performance gain using dataset bootstrapping and demonstrate
state-of-the-art results achieved by the proposed deep metric learning methods.Comment: 10 pages, 9 figures, CVPR 201
Incorporating Intra-Class Variance to Fine-Grained Visual Recognition
Fine-grained visual recognition aims to capture discriminative
characteristics amongst visually similar categories. The state-of-the-art
research work has significantly improved the fine-grained recognition
performance by deep metric learning using triplet network. However, the impact
of intra-category variance on the performance of recognition and robust feature
representation has not been well studied. In this paper, we propose to leverage
intra-class variance in metric learning of triplet network to improve the
performance of fine-grained recognition. Through partitioning training images
within each category into a few groups, we form the triplet samples across
different categories as well as different groups, which is called Group
Sensitive TRiplet Sampling (GS-TRS). Accordingly, the triplet loss function is
strengthened by incorporating intra-class variance with GS-TRS, which may
contribute to the optimization objective of triplet network. Extensive
experiments over benchmark datasets CompCar and VehicleID show that the
proposed GS-TRS has significantly outperformed state-of-the-art approaches in
both classification and retrieval tasks.Comment: 6 pages, 5 figure
Iterative Object and Part Transfer for Fine-Grained Recognition
The aim of fine-grained recognition is to identify sub-ordinate categories in
images like different species of birds. Existing works have confirmed that, in
order to capture the subtle differences across the categories, automatic
localization of objects and parts is critical. Most approaches for object and
part localization relied on the bottom-up pipeline, where thousands of region
proposals are generated and then filtered by pre-trained object/part models.
This is computationally expensive and not scalable once the number of
objects/parts becomes large. In this paper, we propose a nonparametric
data-driven method for object and part localization. Given an unlabeled test
image, our approach transfers annotations from a few similar images retrieved
in the training set. In particular, we propose an iterative transfer strategy
that gradually refine the predicted bounding boxes. Based on the located
objects and parts, deep convolutional features are extracted for recognition.
We evaluate our approach on the widely-used CUB200-2011 dataset and a new and
large dataset called Birdsnap. On both datasets, we achieve better results than
many state-of-the-art approaches, including a few using oracle (manually
annotated) bounding boxes in the test images.Comment: To appear in ICME 2017 as an oral pape
Fine-graind Image Classification via Combining Vision and Language
Fine-grained image classification is a challenging task due to the large
intra-class variance and small inter-class variance, aiming at recognizing
hundreds of sub-categories belonging to the same basic-level category. Most
existing fine-grained image classification methods generally learn part
detection models to obtain the semantic parts for better classification
accuracy. Despite achieving promising results, these methods mainly have two
limitations: (1) not all the parts which obtained through the part detection
models are beneficial and indispensable for classification, and (2)
fine-grained image classification requires more detailed visual descriptions
which could not be provided by the part locations or attribute annotations. For
addressing the above two limitations, this paper proposes the two-stream model
combining vision and language (CVL) for learning latent semantic
representations. The vision stream learns deep representations from the
original visual information via deep convolutional neural network. The language
stream utilizes the natural language descriptions which could point out the
discriminative parts or characteristics for each image, and provides a flexible
and compact way of encoding the salient visual aspects for distinguishing
sub-categories. Since the two streams are complementary, combining the two
streams can further achieves better classification accuracy. Comparing with 12
state-of-the-art methods on the widely used CUB-200-2011 dataset for
fine-grained image classification, the experimental results demonstrate our CVL
approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201
Part-based Multi-stream Model for Vehicle Searching
Due to the enormous requirement in public security and intelligent
transportation system, searching an identical vehicle has become more and more
important. Current studies usually treat vehicle as an integral object and then
train a distance metric to measure the similarity among vehicles. However,
these raw images may be exactly similar to ones with different identification
and include some pixels in background that may disturb the distance metric
learning. In this paper, we propose a novel and useful method to segment an
original vehicle image into several discriminative foreground parts, and these
parts consist of some fine grained regions that are named discriminative
patches. After that, these parts combined with the raw image are fed into the
proposed deep learning network. We can easily measure the similarity of two
vehicle images by computing the Euclidean distance of the features from FC
layer. Two main contributions of this paper are as follows. Firstly, a method
is proposed to estimate if a patch in a raw vehicle image is discriminative or
not. Secondly, a new Part-based Multi-Stream Model (PMSM) is designed and
optimized for vehicle retrieval and re-identification tasks. We evaluate the
proposed method on the VehicleID dataset, and the experimental results show
that our method can outperform the baseline.Comment: Published in International Conference on Pattern Recognition 201