Search CORE

28,284 research outputs found

Deep Learning on Lie Groups for Skeleton-based Action Recognition

Author: Huang Zhiwu
Probst Thomas
Van Gool Luc
Wan Chengde
Publication venue
Publication date: 01/01/2017
Field of study

In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature dimensionality, the architecture is equipped with rotation pooling layers for the elements on the Lie group. Furthermore, we propose a logarithm mapping layer to map the resulting manifold data into a tangent space that facilitates the application of regular output layers for the final classification. Evaluations of the proposed network for standard 3D human action recognition datasets clearly demonstrate its superiority over existing shallow Lie group feature learning methods as well as most conventional deep learning methods.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Institutional Knowledge at Singapore Management University

When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data

Author: Cavazza Jacopo
Morerio Pietro
Murino Vittorio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/08/2017
Field of study

Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computer vision, thanks to recently introduced 3D sensors. In the literature, naive methods simply transfer off-the-shelf techniques from video to the skeletal representation. However, the current state-of-the-art is contended between to different paradigms: kernel-based methods and feature learning with (recurrent) neural networks. Both approaches show strong performances, yet they exhibit heavy, but complementary, drawbacks. Motivated by this fact, our work aims at combining together the best of the two paradigms, by proposing an approach where a shallow network is fed with a covariance representation. Our intuition is that, as long as the dynamics is effectively modeled, there is no need for the classification network to be deep nor recurrent in order to score favorably. We validate this hypothesis in a broad experimental analysis over 6 publicly available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop

arXiv.org e-Print Archive

Crossref

Towards a new ITU-T recommendation for subjective methods evaluating gaming QoE

Author: Antons Jan-Niklas
Beyer Justus
Eggert Sebastian
Möller Sebastian
Nunez Castellar Elena Patricia
Skorin-Kapov Lea
Sužnjević Mirko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper reports on activities in Study Group 12 of the International Telecommunication Union (ITU-T SG12) to define a new Recommendation on subjective evaluation methods for gaming Quality of Experience (QoE). It first resumes the structure and content of the current draft which has been proposed to ITU-T SG12 in September 2014 and then critically discusses potential gaming content and evaluation methods for inclusion into the upcoming Recommendation. The aim is to start a discussion amongst experts on potential evaluation methods and their limitations, before finalizing a Recommendation. Such a recommendation might in the end be applied by non -expert users, hence wrong decisions in the evaluation design could negatively affect gaming QoE throughout the evaluation

Crossref

Ghent University Academic Bibliography

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Author: Chen Yucheng
Cheng Xuelian
Dai Yuchao
He Mingyi
Li Bo
Publication venue
Publication date: 12/06/2017
Field of study

This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method

arXiv.org e-Print Archive

Crossref

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Author: Du Y.
Gowayyed M. A.
Hussein M. E.
Krizhevsky A.
Wang P.
Yang X.
Zhu W.
Publication venue
Publication date: 01/01/2016
Field of study

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in

3D

skeleton sequences into multiple

2D

images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results

arXiv.org e-Print Archive

Crossref

Research Online