Search CORE

4 research outputs found

Human action recognition using fusion of depth and inertial sensors

Author: BC Ustundag
C Chen
C Chen
J Han
JK Aggarwal
K Altun
LE Scales
M Ermes
S Qaisar
SJ Orfanidis
W Li
Y Ming
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/06/2018
Field of study

In this paper we present a human action recognition system that utilizes the fusion of depth and inertial sensor measurements. Robust depth and inertial signal features, that are subject-invariant, are used to train independent Neural Networks, and later decision level fusion is employed using a probabilistic framework in the form of Logarithmic Opinion Pool. The system is evaluated using UTD-Multimodal Human Action Dataset, and we achieve 95% accuracy in 8-fold cross-validation, which is not only higher than using each sensor separately, but is also better than the best accuracy obtained on the mentioned dataset by 3.5%

Crossref

Sabanci University Research Database

A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System

Author: Akbara Shaik Ali
Gummana Egna Praneeth
Pandey Hari Mohan
Rafiqi Muhtashim
Tiwari Kamlesh
Yadav Santosh Kumar
Publication venue
Publication date: 27/06/2023
Field of study

This paper presents a novel multimodal human activity recognition system. It uses a two-stream decision level fusion of vision and inertial sensors. In the first stream, raw RGB frames are passed to a part affinity field-based pose estimation network to detect the keypoints of the user. These keypoints are then pre-processed and inputted in a sliding window fashion to a specially designed convolutional neural network for the spatial feature extraction followed by regularized LSTMs to calculate the temporal features. The outputs of LSTM networks are then inputted to fully connected layers for classification. In the second stream, data obtained from inertial sensors are pre-processed and inputted to regularized LSTMs for the feature extraction followed by fully connected layers for the classification. At this stage, the SoftMax scores of two streams are then fused using the decision level fusion which gives the final prediction. Extensive experiments are conducted to evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations. The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and 95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and C-MHAD datasets. These results are far superior than the current state-of-the-art methods

arXiv.org e-Print Archive