13,400 research outputs found

    Discriminative Feature Learning with Application to Fine-grained Recognition

    Get PDF
    For various computer vision tasks, finding suitable feature representations is fundamental. Fine-grained recognition, distinguishing sub-categories under the same super-category (e.g., bird species, car makes and models, etc.), serves as a good task to study discriminative feature learning for visual recognition task. The main reason is that the inter-class variations between fine-grained categories are very subtle and even smaller than intra-class variations caused by pose or deformation. This thesis focuses on tasks mostly related to fine-grained categories. After briefly discussing our earlier attempt to capture subtle visual differences using sparse/low-rank analysis, the main part of the thesis reflects the trends in the past a few years as deep learning prevails. In the first part of the thesis, we address the problem of fine-grained recognition via a patch-based framework built upon Convolutional Neural Network (CNN) features. We introduce triplets of patches with two geometric constraints to improve the accuracy of patch localization, and automatically mine discriminative geometrically-constrained triplets for recognition. In the second part we begin to learn discriminative features in an end-to-end fashion. We propose a supervised feature learning approach, Label Consistent Neural Network, which enforces direct supervision in late hidden layers. We associate each neuron in a hidden layer with a particular class and encourage it to be activated for input signals from the same class by introducing a label consistency regularization. This label consistency constraint makes the features more discriminative and tends to faster convergence. The third part proposes a more sophisticated and effective end-to-end network specifically designed for fine-grained recognition, which learns discriminative patches within a CNN. We show that patch-level learning capability of CNN can be enhanced by learning a bank of convolutional filters that capture class-specific discriminative patches without extra part or bounding box annotations. Such a filter bank is well structured, properly initialized and discriminatively learned through a novel asymmetric multi-stream architecture with convolutional filter supervision and a non-random layer initialization. In the last part we goes beyond obtaining category labels and study the problem of continuous 3D pose estimation for fine-grained object categories. We augment three existing popular fine-grained recognition datasets by annotating each instance in the image with corresponding fine-grained 3D shape and ground-truth 3D pose. We cast the problem into a detection framework based on Faster/Mask R-CNN. To utilize the 3D information, we also introduce a novel 3D representation, named as location field, that is effective for representing 3D shapes

    Compositional Model based Fisher Vector Coding for Image Classification

    Full text link
    Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to depict the generation process of local features. However, the representative power of the GMM could be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes and the number of prototypes is usually small in FVC. To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as the linear combination of multiple key components and the combination weight is a latent random variable. In this way, we can greatly enhance the representative power of the generative model of FVC. To implement our idea, we designed two particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and Machine Intelligence (TPAMI

    Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features

    Full text link
    TThe goal of our work is to discover dominant objects in a very general setting where only a single unlabeled image is given. This is far more challenge than typical co-localization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which exploits the advantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs). Specifically, we first convert the feature maps from a pre-trained CNN model into a set of transactions, and then discovers frequent patterns from transaction database through pattern mining techniques. We observe that those discovered patterns, i.e., co-occurrence highlighted regions, typically hold appearance and spatial consistency. Motivated by this observation, we can easily discover and localize possible objects by merging relevant meaningful patterns. Extensive experiments on a variety of benchmarks demonstrate that OLM achieves competitive localization performance compared with the state-of-the-art methods. We also evaluate our approach compared with unsupervised saliency detection methods and achieves competitive results on seven benchmark datasets. Moreover, we conduct experiments on fine-grained classification to show that our proposed method can locate the entire object and parts accurately, which can benefit to improving the classification results significantly
    • …
    corecore