2,590 research outputs found
View Independent Vehicle Make, Model and Color Recognition Using Convolutional Neural Network
This paper describes the details of Sighthound's fully automated vehicle
make, model and color recognition system. The backbone of our system is a deep
convolutional neural network that is not only computationally inexpensive, but
also provides state-of-the-art results on several competitive benchmarks.
Additionally, our deep network is trained on a large dataset of several million
images which are labeled through a semi-automated process. Finally we test our
system on several public datasets as well as our own internal test dataset. Our
results show that we outperform other methods on all benchmarks by significant
margins. Our model is available to developers through the Sighthound Cloud API
at https://www.sighthound.com/products/cloudComment: 7 Page
Fine-grained Image Classification by Exploring Bipartite-Graph Labels
Given a food image, can a fine-grained object recognition engine tell "which
restaurant which dish" the food belongs to? Such ultra-fine grained image
recognition is the key for many applications like search by images, but it is
very challenging because it needs to discern subtle difference between classes
while dealing with the scarcity of training data. Fortunately, the ultra-fine
granularity naturally brings rich relationships among object classes. This
paper proposes a novel approach to exploit the rich relationships through
bipartite-graph labels (BGL). We show how to model BGL in an overall
convolutional neural networks and the resulting system can be optimized through
back-propagation. We also show that it is computationally efficient in
inference thanks to the bipartite structure. To facilitate the study, we
construct a new food benchmark dataset, which consists of 37,885 food images
collected from 6 restaurants and totally 975 menus. Experimental results on
this new food and three other datasets demonstrates BGL advances previous works
in fine-grained object recognition. An online demo is available at
http://www.f-zhou.com/fg_demo/
Mining Discriminative Triplets of Patches for Fine-Grained Classification
Fine-grained classification involves distinguishing between similar
sub-categories based on subtle differences in highly localized regions;
therefore, accurate localization of discriminative regions remains a major
challenge. We describe a patch-based framework to address this problem. We
introduce triplets of patches with geometric constraints to improve the
accuracy of patch localization, and automatically mine discriminative
geometrically-constrained triplets for classification. The resulting approach
only requires object bounding boxes. Its effectiveness is demonstrated using
four publicly available fine-grained datasets, on which it outperforms or
achieves comparable performance to the state-of-the-art in classification
Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop
Existing fine-grained visual categorization methods often suffer from three
challenges: lack of training data, large number of fine-grained categories, and
high intraclass vs. low inter-class variance. In this work we propose a generic
iterative framework for fine-grained categorization and dataset bootstrapping
that handles these three challenges. Using deep metric learning with humans in
the loop, we learn a low dimensional feature embedding with anchor points on
manifolds for each category. These anchor points capture intra-class variances
and remain discriminative between classes. In each round, images with high
confidence scores from our model are sent to humans for labeling. By comparing
with exemplar images, labelers mark each candidate image as either a "true
positive" or a "false positive". True positives are added into our current
dataset and false positives are regarded as "hard negatives" for our metric
learning model. Then the model is retrained with an expanded dataset and hard
negatives for the next round. To demonstrate the effectiveness of the proposed
framework, we bootstrap a fine-grained flower dataset with 620 categories from
Instagram images. The proposed deep metric learning scheme is evaluated on both
our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show
significant performance gain using dataset bootstrapping and demonstrate
state-of-the-art results achieved by the proposed deep metric learning methods.Comment: 10 pages, 9 figures, CVPR 201
Incorporating Intra-Class Variance to Fine-Grained Visual Recognition
Fine-grained visual recognition aims to capture discriminative
characteristics amongst visually similar categories. The state-of-the-art
research work has significantly improved the fine-grained recognition
performance by deep metric learning using triplet network. However, the impact
of intra-category variance on the performance of recognition and robust feature
representation has not been well studied. In this paper, we propose to leverage
intra-class variance in metric learning of triplet network to improve the
performance of fine-grained recognition. Through partitioning training images
within each category into a few groups, we form the triplet samples across
different categories as well as different groups, which is called Group
Sensitive TRiplet Sampling (GS-TRS). Accordingly, the triplet loss function is
strengthened by incorporating intra-class variance with GS-TRS, which may
contribute to the optimization objective of triplet network. Extensive
experiments over benchmark datasets CompCar and VehicleID show that the
proposed GS-TRS has significantly outperformed state-of-the-art approaches in
both classification and retrieval tasks.Comment: 6 pages, 5 figure
- …