5,009 research outputs found
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
Generalized Max Pooling
State-of-the-art patch-based image representations involve a pooling
operation that aggregates statistics computed from local descriptors. Standard
pooling operations include sum- and max-pooling. Sum-pooling lacks
discriminability because the resulting representation is strongly influenced by
frequent yet often uninformative descriptors, but only weakly influenced by
rare yet potentially highly-informative ones. Max-pooling equalizes the
influence of frequent and rare descriptors but is only applicable to
representations that rely on count statistics, such as the bag-of-visual-words
(BOV) and its soft- and sparse-coding extensions. We propose a novel pooling
mechanism that achieves the same effect as max-pooling but is applicable beyond
the BOV and especially to the state-of-the-art Fisher Vector -- hence the name
Generalized Max Pooling (GMP). It involves equalizing the similarity between
each patch and the pooled representation, which is shown to be equivalent to
re-weighting the per-patch statistics. We show on five public image
classification benchmarks that the proposed GMP can lead to significant
performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern
Recognition (2014
Fast and Robust Archetypal Analysis for Representation Learning
We revisit a pioneer unsupervised learning technique called archetypal
analysis, which is related to successful data analysis methods such as sparse
coding and non-negative matrix factorization. Since it was proposed, archetypal
analysis did not gain a lot of popularity even though it produces more
interpretable models than other alternatives. Because no efficient
implementation has ever been made publicly available, its application to
important scientific problems may have been severely limited. Our goal is to
bring back into favour archetypal analysis. We propose a fast optimization
scheme using an active-set strategy, and provide an efficient open-source
implementation interfaced with Matlab, R, and Python. Then, we demonstrate the
usefulness of archetypal analysis for computer vision tasks, such as codebook
learning, signal classification, and large image collection visualization
- …