264 research outputs found

    Fine-grained Image Classification by Exploring Bipartite-Graph Labels

    Full text link
    Given a food image, can a fine-grained object recognition engine tell "which restaurant which dish" the food belongs to? Such ultra-fine grained image recognition is the key for many applications like search by images, but it is very challenging because it needs to discern subtle difference between classes while dealing with the scarcity of training data. Fortunately, the ultra-fine granularity naturally brings rich relationships among object classes. This paper proposes a novel approach to exploit the rich relationships through bipartite-graph labels (BGL). We show how to model BGL in an overall convolutional neural networks and the resulting system can be optimized through back-propagation. We also show that it is computationally efficient in inference thanks to the bipartite structure. To facilitate the study, we construct a new food benchmark dataset, which consists of 37,885 food images collected from 6 restaurants and totally 975 menus. Experimental results on this new food and three other datasets demonstrates BGL advances previous works in fine-grained object recognition. An online demo is available at http://www.f-zhou.com/fg_demo/

    Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop

    Full text link
    Existing fine-grained visual categorization methods often suffer from three challenges: lack of training data, large number of fine-grained categories, and high intraclass vs. low inter-class variance. In this work we propose a generic iterative framework for fine-grained categorization and dataset bootstrapping that handles these three challenges. Using deep metric learning with humans in the loop, we learn a low dimensional feature embedding with anchor points on manifolds for each category. These anchor points capture intra-class variances and remain discriminative between classes. In each round, images with high confidence scores from our model are sent to humans for labeling. By comparing with exemplar images, labelers mark each candidate image as either a "true positive" or a "false positive". True positives are added into our current dataset and false positives are regarded as "hard negatives" for our metric learning model. Then the model is retrained with an expanded dataset and hard negatives for the next round. To demonstrate the effectiveness of the proposed framework, we bootstrap a fine-grained flower dataset with 620 categories from Instagram images. The proposed deep metric learning scheme is evaluated on both our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show significant performance gain using dataset bootstrapping and demonstrate state-of-the-art results achieved by the proposed deep metric learning methods.Comment: 10 pages, 9 figures, CVPR 201

    Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition

    Full text link
    A key challenge in fine-grained recognition is how to find and represent discriminative local regions. Recent attention models are capable of learning discriminative region localizers only from category labels with reinforcement learning. However, not utilizing any explicit part information, they are not able to accurately find multiple distinctive regions. In this work, we introduce an attribute-guided attention localization scheme where the local region localizers are learned under the guidance of part attribute descriptions. By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm. The attribute labeling requirement of the scheme is more amenable than the accurate part location annotation required by traditional part-based fine-grained recognition methods. Experimental results on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme on both fine-grained recognition and attribute recognition

    Relevant Deconvolution For Acoustic Source Estimation

    Get PDF
    We describe a robust deconvolution algorithm for simultaneously estimating an acoustic source signal and convolutive filters associated with the acoustic room impulse responses from a pair of microphone signals. In contrast to conventional blind deconvolution techniques which rely upon a knowledge of the statistics of the source signal, our algorithm exploits the nonnegativity and sparsity structure of room impulse responses. The algorithm is formulated as a quadratic optimization problem with respect to both the source signal and filter coefficients, and proceeds by iteratively solving the optimization in two alternating steps. In the H-step, the nonnegative filter coefficients are optimally estimated within a Bayesian framework using a relevant set of regularization parameters. In the S-step, the source signal is estimated without any prior assumption on its statistical distribution. The resulting estimates converge to a relevant solution exhibiting appropriate sparseness in the filters. Simulation results indicate that the algorithm is able to precisely recover both the source signal and filter coefficients, even in the presence of large ambient noise

    Bayesian \u3cem\u3eL\u3c/em\u3e\u3csub\u3e1\u3c/sub\u3e-Norm Sparse Learning

    Get PDF
    We propose a Bayesian framework for learning the optimal regularization parameter in the L1-norm penalized least-mean-square (LMS) problem, also known as LASSO [1] or basis pursuit [2]. The setting of the regularization parameter is critical for deriving a correct solution. In most existing methods, the scalar regularization parameter is often determined in a heuristic manner; in contrast, our approach infers the optimal regularization setting under a Bayesian framework. Furthermore, Bayesian inference enables an independent regularization scheme where each coefficient (or weight) is associated with an independent regularization parameter. Simulations illustrate the improvement using our method in discovering sparse structure from noisy data
    • …
    corecore