17,364 research outputs found
Towards Effective Codebookless Model for Image Classification
The bag-of-features (BoF) model for image classification has been thoroughly
studied over the last decade. Different from the widely used BoF methods which
modeled images with a pre-trained codebook, the alternative codebook free image
modeling method, which we call Codebookless Model (CLM), attracted little
attention. In this paper, we present an effective CLM that represents an image
with a single Gaussian for classification. By embedding Gaussian manifold into
a vector space, we show that the simple incorporation of our CLM into a linear
classifier achieves very competitive accuracy compared with state-of-the-art
BoF methods (e.g., Fisher Vector). Since our CLM lies in a high dimensional
Riemannian manifold, we further propose a joint learning method of low-rank
transformation with support vector machine (SVM) classifier on the Gaussian
manifold, in order to reduce computational and storage cost. To study and
alleviate the side effect of background clutter on our CLM, we also present a
simple yet effective partial background removal method based on saliency
detection. Experiments are extensively conducted on eight widely used databases
to demonstrate the effectiveness and efficiency of our CLM method
Linear Spatial Pyramid Matching Using Non-convex and non-negative Sparse Coding for Image Classification
Recently sparse coding have been highly successful in image classification
mainly due to its capability of incorporating the sparsity of image
representation. In this paper, we propose an improved sparse coding model based
on linear spatial pyramid matching(SPM) and Scale Invariant Feature Transform
(SIFT ) descriptors. The novelty is the simultaneous non-convex and
non-negative characters added to the sparse coding model. Our numerical
experiments show that the improved approach using non-convex and non-negative
sparse coding is superior than the original ScSPM[1] on several typical
databases
Generalized Max Pooling
State-of-the-art patch-based image representations involve a pooling
operation that aggregates statistics computed from local descriptors. Standard
pooling operations include sum- and max-pooling. Sum-pooling lacks
discriminability because the resulting representation is strongly influenced by
frequent yet often uninformative descriptors, but only weakly influenced by
rare yet potentially highly-informative ones. Max-pooling equalizes the
influence of frequent and rare descriptors but is only applicable to
representations that rely on count statistics, such as the bag-of-visual-words
(BOV) and its soft- and sparse-coding extensions. We propose a novel pooling
mechanism that achieves the same effect as max-pooling but is applicable beyond
the BOV and especially to the state-of-the-art Fisher Vector -- hence the name
Generalized Max Pooling (GMP). It involves equalizing the similarity between
each patch and the pooled representation, which is shown to be equivalent to
re-weighting the per-patch statistics. We show on five public image
classification benchmarks that the proposed GMP can lead to significant
performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern
Recognition (2014
- …