2,342 research outputs found
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
Attribute-Graph: A Graph based approach to Image Ranking
We propose a novel image representation, termed Attribute-Graph, to rank
images by their semantic similarity to a given query image. An Attribute-Graph
is an undirected fully connected graph, incorporating both local and global
image characteristics. The graph nodes characterise objects as well as the
overall scene context using mid-level semantic attributes, while the edges
capture the object topology. We demonstrate the effectiveness of
Attribute-Graphs by applying them to the problem of image ranking. We benchmark
the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets,
which we have created in order to evaluate the ranking performance on complex
queries containing multiple objects. Our experimental evaluation shows that
modelling images as Attribute-Graphs results in improved ranking performance
over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201
Temporal Extension of Scale Pyramid and Spatial Pyramid Matching for Action Recognition
Historically, researchers in the field have spent a great deal of effort to
create image representations that have scale invariance and retain spatial
location information. This paper proposes to encode equivalent temporal
characteristics in video representations for action recognition. To achieve
temporal scale invariance, we develop a method called temporal scale pyramid
(TSP). To encode temporal information, we present and compare two methods
called temporal extension descriptor (TED) and temporal division pyramid (TDP)
. Our purpose is to suggest solutions for matching complex actions that have
large variation in velocity and appearance, which is missing from most current
action representations. The experimental results on four benchmark datasets,
UCF50, HMDB51, Hollywood2 and Olympic Sports, support our approach and
significantly outperform state-of-the-art methods. Most noticeably, we achieve
65.0% mean accuracy and 68.2% mean average precision on the challenging HMDB51
and Hollywood2 datasets which constitutes an absolute improvement over the
state-of-the-art by 7.8% and 3.9%, respectively
- …