12,060 research outputs found
Flowing ConvNets for Human Pose Estimation in Videos
The objective of this work is human pose estimation in videos, where multiple
frames are available. We investigate a ConvNet architecture that is able to
benefit from temporal context by combining information across the multiple
frames using optical flow.
To this end we propose a network architecture with the following novelties:
(i) a deeper network than previously investigated for regressing heatmaps; (ii)
spatial fusion layers that learn an implicit spatial model; (iii) optical flow
is used to align heatmap predictions from neighbouring frames; and (iv) a final
parametric pooling layer which learns to combine the aligned heatmaps into a
pooled confidence map.
We show that this architecture outperforms a number of others, including one
that uses optical flow solely at the input layers, one that regresses joint
coordinates directly, and one that predicts heatmaps without spatial fusion.
The new architecture outperforms the state of the art by a large margin on
three video pose estimation datasets, including the very challenging Poses in
the Wild dataset, and outperforms other deep methods that don't use a graphical
model on the single-image FLIC benchmark (and also Chen & Yuille and Tompson et
al. in the high precision region).Comment: ICCV'1
Investigation of Different Skeleton Features for CNN-based 3D Action Recognition
Deep learning techniques are being used in skeleton based action recognition
tasks and outstanding performance has been reported. Compared with RNN based
methods which tend to overemphasize temporal information, CNN-based approaches
can jointly capture spatio-temporal information from texture color images
encoded from skeleton sequences. There are several skeleton-based features that
have proven effective in RNN-based and handcrafted-feature-based methods.
However, it remains unknown whether they are suitable for CNN-based approaches.
This paper proposes to encode five spatial skeleton features into images with
different encoding methods. In addition, the performance implication of
different joints used for feature extraction is studied. The proposed method
achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action
analysis. An accuracy of 75.32\% was achieved in Large Scale 3D Human Activity
Analysis Challenge in Depth Videos
Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks
This paper addresses the problem of continuous gesture recognition from
sequences of depth maps using convolutional neutral networks (ConvNets). The
proposed method first segments individual gestures from a depth sequence based
on quantity of movement (QOM). For each segmented gesture, an Improved Depth
Motion Map (IDMM), which converts the depth sequence into one image, is
constructed and fed to a ConvNet for recognition. The IDMM effectively encodes
both spatial and temporal information and allows the fine-tuning with existing
ConvNet models for classification without introducing millions of parameters to
learn. The proposed method is evaluated on the Large-scale Continuous Gesture
Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved
the performance of 0.2655 (Mean Jaccard Index) and ranked place in
this challenge
Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons
Current methods for skeleton-based human action recognition usually work with
completely observed skeletons. However, in real scenarios, it is prone to
capture incomplete and noisy skeletons, which will deteriorate the performance
of traditional models. To enhance the robustness of action recognition models
to incomplete skeletons, we propose a multi-stream graph convolutional network
(GCN) for exploring sufficient discriminative features distributed over all
skeleton joints. Here, each stream of the network is only responsible for
learning features from currently unactivated joints, which are distinguished by
the class activation maps (CAM) obtained by preceding streams, so that the
activated joints of the proposed method are obviously more than traditional
methods. Thus, the proposed method is termed richly activated GCN (RA-GCN),
where the richly discovered features will improve the robustness of the model.
Compared to the state-of-the-art methods, the RA-GCN achieves comparable
performance on the NTU RGB+D dataset. Moreover, on a synthetic occlusion
dataset, the performance deterioration can be alleviated by the RA-GCN
significantly.Comment: Accepted by ICIP 2019, 5 pages, 3 figures, 3 table
- …