264 research outputs found
Fine-grained Image Classification by Exploring Bipartite-Graph Labels
Given a food image, can a fine-grained object recognition engine tell "which
restaurant which dish" the food belongs to? Such ultra-fine grained image
recognition is the key for many applications like search by images, but it is
very challenging because it needs to discern subtle difference between classes
while dealing with the scarcity of training data. Fortunately, the ultra-fine
granularity naturally brings rich relationships among object classes. This
paper proposes a novel approach to exploit the rich relationships through
bipartite-graph labels (BGL). We show how to model BGL in an overall
convolutional neural networks and the resulting system can be optimized through
back-propagation. We also show that it is computationally efficient in
inference thanks to the bipartite structure. To facilitate the study, we
construct a new food benchmark dataset, which consists of 37,885 food images
collected from 6 restaurants and totally 975 menus. Experimental results on
this new food and three other datasets demonstrates BGL advances previous works
in fine-grained object recognition. An online demo is available at
http://www.f-zhou.com/fg_demo/
Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop
Existing fine-grained visual categorization methods often suffer from three
challenges: lack of training data, large number of fine-grained categories, and
high intraclass vs. low inter-class variance. In this work we propose a generic
iterative framework for fine-grained categorization and dataset bootstrapping
that handles these three challenges. Using deep metric learning with humans in
the loop, we learn a low dimensional feature embedding with anchor points on
manifolds for each category. These anchor points capture intra-class variances
and remain discriminative between classes. In each round, images with high
confidence scores from our model are sent to humans for labeling. By comparing
with exemplar images, labelers mark each candidate image as either a "true
positive" or a "false positive". True positives are added into our current
dataset and false positives are regarded as "hard negatives" for our metric
learning model. Then the model is retrained with an expanded dataset and hard
negatives for the next round. To demonstrate the effectiveness of the proposed
framework, we bootstrap a fine-grained flower dataset with 620 categories from
Instagram images. The proposed deep metric learning scheme is evaluated on both
our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show
significant performance gain using dataset bootstrapping and demonstrate
state-of-the-art results achieved by the proposed deep metric learning methods.Comment: 10 pages, 9 figures, CVPR 201
Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition
A key challenge in fine-grained recognition is how to find and represent
discriminative local regions. Recent attention models are capable of learning
discriminative region localizers only from category labels with reinforcement
learning. However, not utilizing any explicit part information, they are not
able to accurately find multiple distinctive regions. In this work, we
introduce an attribute-guided attention localization scheme where the local
region localizers are learned under the guidance of part attribute
descriptions. By designing a novel reward strategy, we are able to learn to
locate regions that are spatially and semantically distinctive with
reinforcement learning algorithm. The attribute labeling requirement of the
scheme is more amenable than the accurate part location annotation required by
traditional part-based fine-grained recognition methods. Experimental results
on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme
on both fine-grained recognition and attribute recognition
Relevant Deconvolution For Acoustic Source Estimation
We describe a robust deconvolution algorithm for simultaneously estimating an acoustic source signal and convolutive filters associated with the acoustic room impulse responses from a pair of microphone signals. In contrast to conventional blind deconvolution techniques which rely upon a knowledge of the statistics of the source signal, our algorithm exploits the nonnegativity and sparsity structure of room impulse responses. The algorithm is formulated as a quadratic optimization problem with respect to both the source signal and filter coefficients, and proceeds by iteratively solving the optimization in two alternating steps. In the H-step, the nonnegative filter coefficients are optimally estimated within a Bayesian framework using a relevant set of regularization parameters. In the S-step, the source signal is estimated without any prior assumption on its statistical distribution. The resulting estimates converge to a relevant solution exhibiting appropriate sparseness in the filters. Simulation results indicate that the algorithm is able to precisely recover both the source signal and filter coefficients, even in the presence of large ambient noise
Bayesian \u3cem\u3eL\u3c/em\u3e\u3csub\u3e1\u3c/sub\u3e-Norm Sparse Learning
We propose a Bayesian framework for learning the optimal regularization parameter in the L1-norm penalized least-mean-square (LMS) problem, also known as LASSO [1] or basis pursuit [2]. The setting of the regularization parameter is critical for deriving a correct solution. In most existing methods, the scalar regularization parameter is often determined in a heuristic manner; in contrast, our approach infers the optimal regularization setting under a Bayesian framework. Furthermore, Bayesian inference enables an independent regularization scheme where each coefficient (or weight) is associated with an independent regularization parameter. Simulations illustrate the improvement using our method in discovering sparse structure from noisy data
- …