72,371 research outputs found
A group sparsity-driven approach to 3-D action recognition
In this paper, a novel 3-D action recognition method based on sparse representation is presented. Silhouette images from multiple cameras are combined to obtain motion history volumes (MHVs). Cylindrical Fourier transform of MHVs is used as action descriptors. We assume that a test sample has a sparse representation in the space of training samples. We cast the action classification problem as an optimization problem and classify actions using group sparsity based on l1 regularization. We show experimental results using the IXMAS multi-view database and demonstratethe superiority of our method, especially when observations are low resolution, occluded, and noisy and when
the feature dimension is reduced
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on
untrimmed videos using convolutional neural networks. Our algorithm learns from
video-level class labels and predicts temporal intervals of human actions with
no requirement of temporal localization annotations. We design our network to
identify a sparse subset of key segments associated with target actions in a
video using an attention module and fuse the key segments through adaptive
temporal pooling. Our loss function is comprised of two terms that minimize the
video-level action classification error and enforce the sparsity of the segment
selection. At inference time, we extract and score temporal proposals using
temporal class activations and class-agnostic attentions to estimate the time
intervals that correspond to target actions. The proposed algorithm attains
state-of-the-art results on the THUMOS14 dataset and outstanding performance on
ActivityNet1.3 even with its weak supervision.Comment: Accepted to CVPR 201
Memory-Augmented Temporal Dynamic Learning for Action Recognition
Human actions captured in video sequences contain two crucial factors for
action recognition, i.e., visual appearance and motion dynamics. To model these
two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are
adopted in most existing successful methods for recognizing actions. However,
CNN based methods are limited in modeling long-term motion dynamics. RNNs are
able to learn temporal motion dynamics but lack effective ways to tackle
unsteady dynamics in long-duration motion. In this work, we propose a
memory-augmented temporal dynamic learning network, which learns to write the
most evident information into an external memory module and ignore irrelevant
ones. In particular, we present a differential memory controller to make a
discrete decision on whether the external memory module should be updated with
current feature. The discrete memory controller takes in the memory history,
context embedding and current feature as inputs and controls information flow
into the external memory module. Additionally, we train this discrete memory
controller using straight-through estimator. We evaluate this end-to-end system
on benchmark datasets (UCF101 and HMDB51) of human action recognition. The
experimental results show consistent improvements on both datasets over prior
works and our baselines.Comment: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19
- …