1,357,670 research outputs found
Interpretable 3D Human Action Analysis with Temporal Convolutional Networks
The discriminative power of modern deep learning models for 3D human action
recognition is growing ever so potent. In conjunction with the recent
resurgence of 3D human action representation with 3D skeletons, the quality and
the pace of recent progress have been significant. However, the inner workings
of state-of-the-art learning based methods in 3D human action recognition still
remain mostly black-box. In this work, we propose to use a new class of models
known as Temporal Convolutional Neural Networks (TCN) for 3D human action
recognition. Compared to popular LSTM-based Recurrent Neural Network models,
given interpretable input such as 3D skeletons, TCN provides us a way to
explicitly learn readily interpretable spatio-temporal representations for 3D
human action recognition. We provide our strategy in re-designing the TCN with
interpretability in mind and how such characteristics of the model is leveraged
to construct a powerful 3D activity recognition method. Through this work, we
wish to take a step towards a spatio-temporal model that is easier to
understand, explain and interpret. The resulting model, Res-TCN, achieves
state-of-the-art results on the largest 3D human action recognition dataset,
NTU-RGBD.Comment: 8 pages, 5 figures, BNMW CVPR 2017 Submissio
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
- …