12,060 research outputs found

    Flowing ConvNets for Human Pose Estimation in Videos

    Full text link
    The objective of this work is human pose estimation in videos, where multiple frames are available. We investigate a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow. To this end we propose a network architecture with the following novelties: (i) a deeper network than previously investigated for regressing heatmaps; (ii) spatial fusion layers that learn an implicit spatial model; (iii) optical flow is used to align heatmap predictions from neighbouring frames; and (iv) a final parametric pooling layer which learns to combine the aligned heatmaps into a pooled confidence map. We show that this architecture outperforms a number of others, including one that uses optical flow solely at the input layers, one that regresses joint coordinates directly, and one that predicts heatmaps without spatial fusion. The new architecture outperforms the state of the art by a large margin on three video pose estimation datasets, including the very challenging Poses in the Wild dataset, and outperforms other deep methods that don't use a graphical model on the single-image FLIC benchmark (and also Chen & Yuille and Tompson et al. in the high precision region).Comment: ICCV'1

    Investigation of Different Skeleton Features for CNN-based 3D Action Recognition

    Full text link
    Deep learning techniques are being used in skeleton based action recognition tasks and outstanding performance has been reported. Compared with RNN based methods which tend to overemphasize temporal information, CNN-based approaches can jointly capture spatio-temporal information from texture color images encoded from skeleton sequences. There are several skeleton-based features that have proven effective in RNN-based and handcrafted-feature-based methods. However, it remains unknown whether they are suitable for CNN-based approaches. This paper proposes to encode five spatial skeleton features into images with different encoding methods. In addition, the performance implication of different joints used for feature extraction is studied. The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis. An accuracy of 75.32\% was achieved in Large Scale 3D Human Activity Analysis Challenge in Depth Videos

    Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

    Full text link
    This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked 3rd3^{rd} place in this challenge

    Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons

    Full text link
    Current methods for skeleton-based human action recognition usually work with completely observed skeletons. However, in real scenarios, it is prone to capture incomplete and noisy skeletons, which will deteriorate the performance of traditional models. To enhance the robustness of action recognition models to incomplete skeletons, we propose a multi-stream graph convolutional network (GCN) for exploring sufficient discriminative features distributed over all skeleton joints. Here, each stream of the network is only responsible for learning features from currently unactivated joints, which are distinguished by the class activation maps (CAM) obtained by preceding streams, so that the activated joints of the proposed method are obviously more than traditional methods. Thus, the proposed method is termed richly activated GCN (RA-GCN), where the richly discovered features will improve the robustness of the model. Compared to the state-of-the-art methods, the RA-GCN achieves comparable performance on the NTU RGB+D dataset. Moreover, on a synthetic occlusion dataset, the performance deterioration can be alleviated by the RA-GCN significantly.Comment: Accepted by ICIP 2019, 5 pages, 3 figures, 3 table
    • …
    corecore