Search CORE

2,750 research outputs found

A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset

Author: Chaaraoui Alexandros André
Flórez-Revuelta Francisco
Padilla-López José Ramón
Publication venue
Publication date: 29/07/2014
Field of study

This paper aims to determine which is the best human action recognition method based on features extracted from RGB-D devices, such as the Microsoft Kinect. A review of all the papers that make reference to MSR Action3D, the most used dataset that includes depth information acquired from a RGB-D device, has been performed. We found that the validation method used by each work differs from the others. So, a direct comparison among works cannot be made. However, almost all the works present their results comparing them without taking into account this issue. Therefore, we present different rankings according to the methodology used for the validation in orden to clarify the existing confusion.Comment: 16 pages and 7 table

arXiv.org e-Print Archive

Repositorio Institucional de la Universidad de Alicante

Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

Author: Chen Chen
Liu Hong
Liu Mengyuan
Publication venue
Publication date: 07/12/2017
Field of study

3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a spatial-temporal vector (STV) descriptor is proposed to describe the spatial-temporal distributions of shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape cues are combined to form a fused action representation. Our model performs favorably compared with common STIP detection and description methods. Thorough experiments verify that our model is effective in distinguishing similar actions and robust to background clutter, partial occlusions and pepper noise

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)