1,935 research outputs found
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation
In the traditional object recognition pipeline, descriptors are densely
sampled over an image, pooled into a high dimensional non-linear representation
and then passed to a classifier. In recent years, Fisher Vectors have proven
empirically to be the leading representation for a large variety of
applications. The Fisher Vector is typically taken as the gradients of the
log-likelihood of descriptors, with respect to the parameters of a Gaussian
Mixture Model (GMM). Motivated by the assumption that different distributions
should be applied for different datasets, we present two other Mixture Models
and derive their Expectation-Maximization and Fisher Vector expressions. The
first is a Laplacian Mixture Model (LMM), which is based on the Laplacian
distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian
Mixture Model (HGLMM) which is based on a weighted geometric mean of the
Gaussian and Laplacian distribution. An interesting property of the
Expectation-Maximization algorithm for the latter is that in the maximization
step, each dimension in each component is chosen to be either a Gaussian or a
Laplacian. Finally, by using the new Fisher Vectors derived from HGLMMs, we
achieve state-of-the-art results for both the image annotation and the image
search by a sentence tasks.Comment: new version includes text synthesis by an RNN and experiments with
the COCO benchmar
- …