Search CORE

4 research outputs found

Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Ogunbona Philip
Tang Chang
Wang Pichao
Publication venue
Publication date: 01/01/2018
Field of study

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814, arXiv:1608.0633

arXiv.org e-Print Archive

Crossref

Research Online

Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition

Author: Gao Zhimin
Hou Yonghong
Li Wanqing
Wang Pichao
Wang Shuang
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2018
Field of study

This paper presents an effective yet simple video representation for RGB-D based action recognition. It proposes to represent a depth map sequence into three pairs of structured dynamic images at body, part and joint levels respectively through hierarchical bidirectional rank pooling. Different from previous works that applied one Convolutional Neural Network (ConvNet) for each part/joint separately, one pair of structured dynamic images is constructed from depth maps at each granularity level and serves as the input of a ConvNet. The structured dynamic image not only preserves the spatial-temporal information but also enhances the structure information across both body parts/joints and different temporal scales. In addition, it requires low computational cost and memory to construct. This new representation, referred to as Spatially and Temporally Structured Dynamic Depth Images (STSDDI), aggregates from global to fine-grained levels motion and structure information in a depth sequence, and enables us to fine-tune the existing ConvNet models trained on image data for classification of depth sequences, without a need for training the models afresh. The proposed representation is evaluated on six benchmark datasets, namely, MSRAction3D, G3D, MSRDailyActivity3D, SYSU 3D HOI, UTD-MHAD and M2I datasets and achieves the state-of-the-art results on all six datasets

Research Online

Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

Detección de acciones humanas a partir de información de profundidad mediante redes neuronales convolucionales

Author: López Diz Sergio de
Publication venue
Publication date: 01/01/2019
Field of study

El objetivo principal del presente trabajo es la implementación de un sistema de detección de acciones humanas en el ámbito de la seguridad y la video-vigilancia a partir de la información de profundidad ("Depth") proporcionada por sensores RGB-D. El sistema se basa en el empleo de redes neuronales convolucionales 3D (3D-CNN) que permiten realizar de forma automática la extracción de características y clasificación de acciones a partir de la información espacial y temporal de las secuencias de profundidad. La propuesta se ha evaluado de forma exhaustiva, obteniendo como resultados experimentales, una precisión del 94% en la detección de acciones. Si tenéis problemas, sugerencias o comentarios sobre el mismo, dirigidlas por favor a Sergio de López Diz .The main objective of this work is the implementation of human actions detection system in the field of security and video-surveillance from depth information provided by RGB-D sensors. The system is based on 3D convolutional neural networks (3D-CNN) that allow the automatic features extraction and actions classification from spatial and temporal information of depth sequences. The proposal has been exhaustively evaluated, obtaining as experimental results, an accuracy of 94% in the actions detection. If you have problems, suggestions or comments on the document, please forward them to Sergio de López Diz .Grado en Ingeniería Electrónica de Comunicacione

e_Buah - Biblioteca Digital de la Universidad de Alcalá