52 research outputs found
Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks
Skeleton based action recognition distinguishes human actions using the
trajectories of skeleton joints, which provide a very good representation for
describing actions. Considering that recurrent neural networks (RNNs) with Long
Short-Term Memory (LSTM) can learn feature representations and model long-term
temporal dependencies automatically, we propose an end-to-end fully connected
deep LSTM network for skeleton based action recognition. Inspired by the
observation that the co-occurrences of the joints intrinsically characterize
human actions, we take the skeleton as the input at each time slot and
introduce a novel regularization scheme to learn the co-occurrence features of
skeleton joints. To train the deep LSTM network effectively, we propose a new
dropout algorithm which simultaneously operates on the gates, cells, and output
responses of the LSTM neurons. Experimental results on three human action
recognition datasets consistently demonstrate the effectiveness of the proposed
model.Comment: AAAI 2016 conferenc
Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks
Recently, skeleton based action recognition gains more popularity due to
cost-effective depth sensors coupled with real-time skeleton estimation
algorithms. Traditional approaches based on handcrafted features are limited to
represent the complexity of motion patterns. Recent methods that use Recurrent
Neural Networks (RNN) to handle raw skeletons only focus on the contextual
dependency in the temporal domain and neglect the spatial configurations of
articulated skeletons. In this paper, we propose a novel two-stream RNN
architecture to model both temporal dynamics and spatial configurations for
skeleton based action recognition. We explore two different structures for the
temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
according to human body kinematics. We also propose two effective methods to
model the spatial structure by converting the spatial graph into a sequence of
joints. To improve generalization of our model, we further exploit 3D
transformation based data augmentation techniques including rotation and
scaling transformation to transform the 3D coordinates of skeletons during
training. Experiments on 3D action recognition benchmark datasets show that our
method brings a considerable improvement for a variety of actions, i.e.,
generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
DIY Human Action Data Set Generation
The recent successes in applying deep learning techniques to solve standard
computer vision problems has aspired researchers to propose new computer vision
problems in different domains. As previously established in the field, training
data itself plays a significant role in the machine learning process,
especially deep learning approaches which are data hungry. In order to solve
each new problem and get a decent performance, a large amount of data needs to
be captured which may in many cases pose logistical difficulties. Therefore,
the ability to generate de novo data or expand an existing data set, however
small, in order to satisfy data requirement of current networks may be
invaluable. Herein, we introduce a novel way to partition an action video clip
into action, subject and context. Each part is manipulated separately and
reassembled with our proposed video generation technique. Furthermore, our
novel human skeleton trajectory generation along with our proposed video
generation technique, enables us to generate unlimited action recognition
training data. These techniques enables us to generate video action clips from
an small set without costly and time-consuming data acquisition. Lastly, we
prove through extensive set of experiments on two small human action
recognition data sets, that this new data generation technique can improve the
performance of current action recognition neural nets
- …