7,736 research outputs found
Fine-grained Image Classification by Exploring Bipartite-Graph Labels
Given a food image, can a fine-grained object recognition engine tell "which
restaurant which dish" the food belongs to? Such ultra-fine grained image
recognition is the key for many applications like search by images, but it is
very challenging because it needs to discern subtle difference between classes
while dealing with the scarcity of training data. Fortunately, the ultra-fine
granularity naturally brings rich relationships among object classes. This
paper proposes a novel approach to exploit the rich relationships through
bipartite-graph labels (BGL). We show how to model BGL in an overall
convolutional neural networks and the resulting system can be optimized through
back-propagation. We also show that it is computationally efficient in
inference thanks to the bipartite structure. To facilitate the study, we
construct a new food benchmark dataset, which consists of 37,885 food images
collected from 6 restaurants and totally 975 menus. Experimental results on
this new food and three other datasets demonstrates BGL advances previous works
in fine-grained object recognition. An online demo is available at
http://www.f-zhou.com/fg_demo/
DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition
Being symmetric positive-definite (SPD), covariance matrix has traditionally
been used to represent a set of local descriptors in visual recognition. Recent
study shows that kernel matrix can give considerably better representation by
modelling the nonlinearity in the local descriptor set. Nevertheless, neither
the descriptors nor the kernel matrix is deeply learned. Worse, they are
considered separately, hindering the pursuit of an optimal SPD representation.
This work proposes a deep network that jointly learns local descriptors,
kernel-matrix-based SPD representation, and the classifier via an end-to-end
training process. We derive the derivatives for the mapping from a local
descriptor set to the SPD representation to carry out backpropagation. Also, we
exploit the Daleckii-Krein formula in operator theory to give a concise and
unified result on differentiating SPD matrix functions, including the matrix
logarithm to handle the Riemannian geometry of kernel matrix. Experiments not
only show the superiority of kernel-matrix-based SPD representation with deep
local descriptors, but also verify the advantage of the proposed deep network
in pursuing better SPD representations for fine-grained image recognition
tasks
Generalized Max Pooling
State-of-the-art patch-based image representations involve a pooling
operation that aggregates statistics computed from local descriptors. Standard
pooling operations include sum- and max-pooling. Sum-pooling lacks
discriminability because the resulting representation is strongly influenced by
frequent yet often uninformative descriptors, but only weakly influenced by
rare yet potentially highly-informative ones. Max-pooling equalizes the
influence of frequent and rare descriptors but is only applicable to
representations that rely on count statistics, such as the bag-of-visual-words
(BOV) and its soft- and sparse-coding extensions. We propose a novel pooling
mechanism that achieves the same effect as max-pooling but is applicable beyond
the BOV and especially to the state-of-the-art Fisher Vector -- hence the name
Generalized Max Pooling (GMP). It involves equalizing the similarity between
each patch and the pooled representation, which is shown to be equivalent to
re-weighting the per-patch statistics. We show on five public image
classification benchmarks that the proposed GMP can lead to significant
performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern
Recognition (2014
Comparator Networks
The objective of this work is set-based verification, e.g. to decide if two
sets of images of a face are of the same person or not. The traditional
approach to this problem is to learn to generate a feature vector per image,
aggregate them into one vector to represent the set, and then compute the
cosine similarity between sets. Instead, we design a neural network
architecture that can directly learn set-wise verification. Our contributions
are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of
sets (each may contain a variable number of images) as inputs, and compute a
similarity between the pair--this involves attending to multiple discriminative
local regions (landmarks), and comparing local descriptors between pairs of
faces; (ii) To encourage high-quality representations for each set, internal
competition is introduced for recalibration based on the landmark score; (iii)
Inspired by image retrieval, a novel hard sample mining regime is proposed to
control the sampling process, such that the DCN is complementary to the
standard image classification models. Evaluations on the IARPA Janus face
recognition benchmarks show that the comparator networks outperform the
previous state-of-the-art results by a large margin.Comment: To appear in ECCV 201
- …