3,764 research outputs found

    Image Classification with CNN-based Fisher Vector Coding

    Get PDF
    Fisher vector coding methods have been demonstrated to be effective for image classification. With the help of convolutional neural networks (CNN), several Fisher vector coding methods have shown state-of-the-art performance by adopting the activations of a single fully-connected layer as region features. These methods generally exploit a diagonal Gaussian mixture model (GMM) to describe the generative process of region features. However, it is difficult to model the complex distribution of high-dimensional feature space with a limited number of Gaussians obtained by unsupervised learning. Simply increasing the number of Gaussians turns out to be inefficient and computationally impractical. To address this issue, we re-interpret a pre-trained CNN as the probabilistic discriminative model, and present a CNN based Fisher vector coding method, termed CNN-FVC. Specifically, activations of the intermediate fully-connected and output soft-max layers are exploited to derive the posteriors, mean and covariance parameters for Fisher vector coding implicitly. To further improve the efficiency, we convert the pre-trained CNN to a fully convolutional one to extract the region features. Extensive experiments have been conducted on two standard scene benchmarks (i.e. SUN397 and MIT67) to evaluate the effectiveness of the proposed method. Classification accuracies of 60.7% and 82.1% are achieved on the SUN397 and MIT67 benchmarks respectively, outperforming previous state-of-the-art approaches. Furthermore, the method is complementary to GMM-FVC methods, allowing a simple fusion scheme to further improve performance to 61.1% and 83.1% respectively

    Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

    Get PDF
    Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, % FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.Comment: Appearing in Proc. Advances in Neural Information Processing Systems (NIPS) 2014, Montreal, Canad

    Compositional Model based Fisher Vector Coding for Image Classification

    Full text link
    Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to depict the generation process of local features. However, the representative power of the GMM could be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes and the number of prototypes is usually small in FVC. To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as the linear combination of multiple key components and the combination weight is a latent random variable. In this way, we can greatly enhance the representative power of the generative model of FVC. To implement our idea, we designed two particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and Machine Intelligence (TPAMI

    Embedding based on function approximation for large scale image search

    Full text link
    The objective of this paper is to design an embedding method that maps local features describing an image (e.g. SIFT) to a higher dimensional representation useful for the image retrieval problem. First, motivated by the relationship between the linear approximation of a nonlinear function in high dimensional space and the stateof-the-art feature representation used in image retrieval, i.e., VLAD, we propose a new approach for the approximation. The embedded vectors resulted by the function approximation process are then aggregated to form a single representation for image retrieval. Second, in order to make the proposed embedding method applicable to large scale problem, we further derive its fast version in which the embedded vectors can be efficiently computed, i.e., in the closed-form. We compare the proposed embedding methods with the state of the art in the context of image search under various settings: when the images are represented by medium length vectors, short vectors, or binary vectors. The experimental results show that the proposed embedding methods outperform existing the state of the art on the standard public image retrieval benchmarks.Comment: Accepted to TPAMI 2017. The implementation and precomputed features of the proposed F-FAemb are released at the following link: http://tinyurl.com/F-FAem
    • …
    corecore