Search CORE

2,042 research outputs found

Multimodal Multipart Learning for Action Recognition in Depth Videos

Author: Ng Tian-Tsong
Shahroudy Amir
Wang Gang
Yang Qingxiong
Publication venue
Publication date: 31/07/2015
Field of study

The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Progressive Teacher-student Learning for Early Action Prediction

Author: Hu Jian-Fang
Lai Jianhuang
Wang Xionghui
Zhang Jianguo
Zheng Wei-Shi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/01/2020
Field of study

Crossref

University of Dundee Online Publications

Gait Recognition from Motion Capture Data

Author: Balazia Michal
Sojka Petr
Publication venue
Publication date: 24/08/2017
Field of study

Gait recognition from motion capture data, as a pattern classification discipline, can be improved by the use of machine learning. This paper contributes to the state-of-the-art with a statistical approach for extracting robust gait features directly from raw data by a modification of Linear Discriminant Analysis with Maximum Margin Criterion. Experiments on the CMU MoCap database show that the suggested method outperforms thirteen relevant methods based on geometric features and a method to learn the features by a combination of Principal Component Analysis and Linear Discriminant Analysis. The methods are evaluated in terms of the distribution of biometric templates in respective feature spaces expressed in a number of class separability coefficients and classification metrics. Results also indicate a high portability of learned features, that means, we can learn what aspects of walk people generally differ in and extract those as general gait features. Recognizing people without needing group-specific features is convenient as particular people might not always provide annotated learning data. As a contribution to reproducible research, our evaluation framework and database have been made publicly available. This research makes motion capture technology directly applicable for human recognition.Comment: Preprint. Full paper accepted at the ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), special issue on Representation, Analysis and Recognition of 3D Humans. 18 pages. arXiv admin note: substantial text overlap with arXiv:1701.00995, arXiv:1609.04392, arXiv:1609.0693

arXiv.org e-Print Archive

Two-Stream RNN/CNN for Action Recognition in 3D Videos

Author: Ali Haider
van der Smagt Patrick
Zhao Rui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/10/2018
Field of study

The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets.Comment: Published in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

arXiv.org e-Print Archive

Crossref