1 research outputs found
Skeleton Based Action Recognition using a Stacked Denoising Autoencoder with Constraints of Privileged Information
Recently, with the availability of cost-effective depth cameras coupled with
real-time skeleton estimation, the interest in skeleton-based human action
recognition is renewed. Most of the existing skeletal representation approaches
use either the joint location or the dynamics model. Differing from the
previous studies, we propose a new method called Denoising Autoencoder with
Temporal and Categorical Constraints (DAE_CTC)} to study the skeletal
representation in a view of skeleton reconstruction. Based on the concept of
learning under privileged information, we integrate action categories and
temporal coordinates into a stacked denoising autoencoder in the training
phase, to preserve category and temporal feature, while learning the hidden
representation from a skeleton. Thus, we are able to improve the discriminative
validity of the hidden representation. In order to mitigate the variation
resulting from temporary misalignment, a new method of temporal registration,
called Locally-Warped Sequence Registration (LWSR), is proposed for registering
the sequences of inter- and intra-class actions. We finally represent the
sequences using a Fourier Temporal Pyramid (FTP) representation and perform
classification using a combination of LWSR registration, FTP representation,
and a linear Support Vector Machine (SVM). The experimental results on three
action data sets, namely MSR-Action3D, UTKinect-Action, and Florence3D-Action,
show that our proposal performs better than many existing methods and
comparably to the state of the art