17,953 research outputs found
Learning activity progression in LSTMs for activity detection and early detection
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Ma_Learning_Activity_Progression_CVPR_2016_paper.htmlPublished versio
Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition
Many of the state-of-the-art algorithms for gesture recognition are based on
Conditional Random Fields (CRFs). Successful approaches, such as the
Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose
values are mapped to the values of the labels. In this paper we propose a novel
methodology to set the latent values according to the gesture complexity. We
use an heuristic that iterates through the samples associated with each label
value, stimating their complexity. We then use it to assign the latent values
to the label values. We evaluate our method on the task of recognizing human
gestures from video streams. The experiments were performed in binary datasets,
generated by grouping different labels. Our results demonstrate that our
approach outperforms the arbitrary one in many cases, increasing the accuracy
by up to 10%.Comment: Conference paper published at 2016 29th SIBGRAPI, Conference on
Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figure
Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks
Infrared (IR) imaging has the potential to enable more robust action
recognition systems compared to visible spectrum cameras due to lower
sensitivity to lighting conditions and appearance variability. While the action
recognition task on videos collected from visible spectrum imaging has received
much attention, action recognition in IR videos is significantly less explored.
Our objective is to exploit imaging data in this modality for the action
recognition task. In this work, we propose a novel two-stream 3D convolutional
neural network (CNN) architecture by introducing the discriminative code layer
and the corresponding discriminative code loss function. The proposed network
processes IR image and the IR-based optical flow field sequences. We pretrain
the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune
it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge,
this is the first application of the 3D CNN to action recognition in the IR
domain. We conduct an elaborate analysis of different fusion schemes (weighted
average, single and double-layer neural nets) applied to different 3D CNN
outputs. Experimental results demonstrate that our approach can achieve
state-of-the-art average precision (AP) performances on the InfAR dataset: (1)
the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our
3D CNN model applied to the optical flow fields achieves the best reported
single stream 75.42% AP
Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks
Skeleton based action recognition distinguishes human actions using the
trajectories of skeleton joints, which provide a very good representation for
describing actions. Considering that recurrent neural networks (RNNs) with Long
Short-Term Memory (LSTM) can learn feature representations and model long-term
temporal dependencies automatically, we propose an end-to-end fully connected
deep LSTM network for skeleton based action recognition. Inspired by the
observation that the co-occurrences of the joints intrinsically characterize
human actions, we take the skeleton as the input at each time slot and
introduce a novel regularization scheme to learn the co-occurrence features of
skeleton joints. To train the deep LSTM network effectively, we propose a new
dropout algorithm which simultaneously operates on the gates, cells, and output
responses of the LSTM neurons. Experimental results on three human action
recognition datasets consistently demonstrate the effectiveness of the proposed
model.Comment: AAAI 2016 conferenc
- …