2 research outputs found
Harnessing the Deep Net Object Models for Enhancing Human Action Recognition
In this study, the influence of objects is investigated in the scenario of
human action recognition with large number of classes. We hypothesize that the
objects the humans are interacting will have good say in determining the action
being performed. Especially, if the objects are non-moving, such as objects
appearing in the background, features such as spatio-temporal interest points,
dense trajectories may fail to detect them. Hence we propose to detect objects
using pre-trained object detectors in every frame statically. Trained Deep
network models are used as object detectors. Information from different layers
in conjunction with different encoding techniques is extensively studied to
obtain the richest feature vectors. This technique is observed to yield
state-of-the-art performance on HMDB51 and UCF101 datasets.Comment: 6 pages. arXiv admin note: text overlap with arXiv:1411.4006 by other
author
Manifold Regularized Slow Feature Analysis for Dynamic Texture Recognition
Dynamic textures exist in various forms, e.g., fire, smoke, and traffic jams,
but recognizing dynamic texture is challenging due to the complex temporal
variations. In this paper, we present a novel approach stemmed from slow
feature analysis (SFA) for dynamic texture recognition. SFA extracts slowly
varying features from fast varying signals. Fortunately, SFA is capable to
leach invariant representations from dynamic textures. However, complex
temporal variations require high-level semantic representations to fully
achieve temporal slowness, and thus it is impractical to learn a high-level
representation from dynamic textures directly by SFA. In order to learn a
robust low-level feature to resolve the complexity of dynamic textures, we
propose manifold regularized SFA (MR-SFA) by exploring the neighbor
relationship of the initial state of each temporal transition and retaining the
locality of their variations. Therefore, the learned features are not only
slowly varying, but also partly predictable. MR-SFA for dynamic texture
recognition is proposed in the following steps: 1) learning feature extraction
functions as convolution filters by MR-SFA, 2) extracting local features by
convolution and pooling, and 3) employing Fisher vectors to form a video-level
representation for classification. Experimental results on dynamic texture and
dynamic scene recognition datasets validate the effectiveness of the proposed
approach.Comment: 12 page