31,188 research outputs found
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
Beat-Event Detection in Action Movie Franchises
While important advances were recently made towards temporally localizing and
recognizing specific human actions or activities in videos, efficient detection
and classification of long video chunks belonging to semantically defined
categories such as "pursuit" or "romance" remains challenging.We introduce a
new dataset, Action Movie Franchises, consisting of a collection of Hollywood
action movie franchises. We define 11 non-exclusive semantic categories -
called beat-categories - that are broad enough to cover most of the movie
footage. The corresponding beat-events are annotated as groups of video shots,
possibly overlapping.We propose an approach for localizing beat-events based on
classifying shots into beat-categories and learning the temporal constraints
between shots. We show that temporal constraints significantly improve the
classification performance. We set up an evaluation protocol for beat-event
localization as well as for shot classification, depending on whether movies
from the same franchise are present or not in the training data
- …