3,764 research outputs found
Image Classification with CNN-based Fisher Vector Coding
Fisher vector coding methods have been demonstrated to be effective for image classification. With the help of convolutional neural networks (CNN), several Fisher vector coding methods have shown state-of-the-art performance by adopting the activations of a single fully-connected layer as region features. These methods generally exploit a diagonal Gaussian mixture model (GMM) to describe the generative process of region features. However, it is difficult to model the complex distribution of high-dimensional feature space with a limited number of Gaussians obtained by unsupervised learning. Simply increasing the number of Gaussians turns out to be inefficient and computationally impractical.
To address this issue, we re-interpret a pre-trained CNN as the probabilistic discriminative model, and present a CNN based Fisher vector coding method, termed CNN-FVC. Specifically, activations of the intermediate fully-connected and output soft-max layers are exploited to derive the posteriors, mean and covariance parameters for Fisher vector coding implicitly. To further improve the efficiency, we convert the pre-trained CNN to a fully convolutional one to extract the region features. Extensive experiments have been conducted on two standard scene benchmarks (i.e. SUN397 and MIT67) to evaluate the effectiveness of the proposed method. Classification accuracies of 60.7% and 82.1% are achieved on the SUN397 and MIT67 benchmarks respectively, outperforming previous state-of-the-art approaches. Furthermore, the method is complementary to GMM-FVC methods, allowing a simple fusion scheme to further improve performance to 61.1% and 83.1% respectively
Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, % FVC implementations employ the
Gaussian mixture model (GMM) to characterize the generation process of local
features. This choice has shown to be sufficient for traditional low
dimensional local features, e.g., SIFT; and typically, good performance can be
achieved with only a few hundred Gaussian distributions. However, the same
number of Gaussians is insufficient to model the feature space spanned by
higher dimensional local features, which have become popular recently. In order
to improve the modeling capacity for high dimensional features, it turns out to
be inefficient and computationally impractical to simply increase the number of
Gaussians. In this paper, we propose a model in which each local feature is
drawn from a Gaussian distribution whose mean vector is sampled from a
subspace. With certain approximation, this model can be converted to a sparse
coding procedure and the learning/inference problems can be readily solved by
standard sparse coding methods. By calculating the gradient vector of the
proposed model, we derive a new fisher vector encoding strategy, termed Sparse
Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently
developed Deep Convolutional Neural Network (CNN) descriptor as a high
dimensional local feature and implement image classification with the proposed
SCFVC. Our experimental evaluations demonstrate that our method not only
significantly outperforms the traditional GMM based Fisher vector encoding but
also achieves the state-of-the-art performance in generic object recognition,
indoor scene, and fine-grained image classification problems.Comment: Appearing in Proc. Advances in Neural Information Processing Systems
(NIPS) 2014, Montreal, Canad
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
Embedding based on function approximation for large scale image search
The objective of this paper is to design an embedding method that maps local
features describing an image (e.g. SIFT) to a higher dimensional representation
useful for the image retrieval problem. First, motivated by the relationship
between the linear approximation of a nonlinear function in high dimensional
space and the stateof-the-art feature representation used in image retrieval,
i.e., VLAD, we propose a new approach for the approximation. The embedded
vectors resulted by the function approximation process are then aggregated to
form a single representation for image retrieval. Second, in order to make the
proposed embedding method applicable to large scale problem, we further derive
its fast version in which the embedded vectors can be efficiently computed,
i.e., in the closed-form. We compare the proposed embedding methods with the
state of the art in the context of image search under various settings: when
the images are represented by medium length vectors, short vectors, or binary
vectors. The experimental results show that the proposed embedding methods
outperform existing the state of the art on the standard public image retrieval
benchmarks.Comment: Accepted to TPAMI 2017. The implementation and precomputed features
of the proposed F-FAemb are released at the following link:
http://tinyurl.com/F-FAem
- …