28,284 research outputs found
Deep Learning on Lie Groups for Skeleton-based Action Recognition
In recent years, skeleton-based action recognition has become a popular 3D
classification problem. State-of-the-art methods typically first represent each
motion sequence as a high-dimensional trajectory on a Lie group with an
additional dynamic time warping, and then shallowly learn favorable Lie group
features. In this paper we incorporate the Lie group structure into a deep
network architecture to learn more appropriate Lie group features for 3D action
recognition. Within the network structure, we design rotation mapping layers to
transform the input Lie group features into desirable ones, which are aligned
better in the temporal domain. To reduce the high feature dimensionality, the
architecture is equipped with rotation pooling layers for the elements on the
Lie group. Furthermore, we propose a logarithm mapping layer to map the
resulting manifold data into a tangent space that facilitates the application
of regular output layers for the final classification. Evaluations of the
proposed network for standard 3D human action recognition datasets clearly
demonstrate its superiority over existing shallow Lie group feature learning
methods as well as most conventional deep learning methods.Comment: Accepted to CVPR 201
When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data
Human action recognition from skeletal data is a hot research topic and
important in many open domain applications of computer vision, thanks to
recently introduced 3D sensors. In the literature, naive methods simply
transfer off-the-shelf techniques from video to the skeletal representation.
However, the current state-of-the-art is contended between to different
paradigms: kernel-based methods and feature learning with (recurrent) neural
networks. Both approaches show strong performances, yet they exhibit heavy, but
complementary, drawbacks. Motivated by this fact, our work aims at combining
together the best of the two paradigms, by proposing an approach where a
shallow network is fed with a covariance representation. Our intuition is that,
as long as the dynamics is effectively modeled, there is no need for the
classification network to be deep nor recurrent in order to score favorably. We
validate this hypothesis in a broad experimental analysis over 6 publicly
available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop
Towards a new ITU-T recommendation for subjective methods evaluating gaming QoE
This paper reports on activities in Study Group 12 of the International Telecommunication Union (ITU-T SG12) to define a new Recommendation on subjective evaluation methods for gaming Quality of Experience (QoE). It first resumes the structure and content of the current draft which has been proposed to ITU-T SG12 in September 2014 and then critically discusses potential gaming content and evaluation methods for inclusion into the upcoming Recommendation. The aim is to start a discussion amongst experts on potential evaluation methods and their limitations, before finalizing a Recommendation. Such a recommendation might in the end be applied by non -expert users, hence wrong decisions in the evaluation design could negatively affect gaming QoE throughout the evaluation
Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn
This paper presents an image classification based approach for skeleton-based
video action recognition problem. Firstly, A dataset independent
translation-scale invariant image mapping method is proposed, which transformes
the skeleton videos to colour images, named skeleton-images. Secondly, A
multi-scale deep convolutional neural network (CNN) architecture is proposed
which could be built and fine-tuned on the powerful pre-trained CNNs, e.g.,
AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very
different from natural images, the fine-tune strategy still works well. At
last, we prove that our method could also work well on 2D skeleton video data.
We achieve the state-of-the-art results on the popular benchmard datasets e.g.
NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge
NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods
by a large margion, which proves the efficacy of the proposed method
Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks
Recently, Convolutional Neural Networks (ConvNets) have shown promising
performances in many computer vision tasks, especially image-based recognition.
How to effectively use ConvNets for video-based recognition is still an open
problem. In this paper, we propose a compact, effective yet simple method to
encode spatio-temporal information carried in skeleton sequences into
multiple images, referred to as Joint Trajectory Maps (JTM), and ConvNets
are adopted to exploit the discriminative features for real-time human action
recognition. The proposed method has been evaluated on three public benchmarks,
i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal
human action dataset (UTD-MHAD) and achieved the state-of-the-art results
- …