Search CORE

772 research outputs found

Recognition of human activity and the state of an assembly task using vision and inertial sensor fusion methods

Author: Male James
Martinez Hernandez Uriel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/06/2021
Field of study

Reliable human machine interfaces is key to accomplishing the goals of Industry 4.0. This work proposes the late fusion of a visual recognition and human action recognition (HAR) classifier. Vision is used to recognise the number of screws assembled into a mock part while HAR from body worn Inertial Measurement Units (IMUs) classifies actions done to assemble the part. Convolutional Neural Network (CNN) methods are used in both modes of classification before various late fusion methods are analysed for prediction of a final state estimate. The fusion methods investigated are mean, weighted average, Support Vector Machine (SVM), Bayesian, Artificial Neural Network (ANN) and Long Short Term Memory (LSTM). The results show the LSTM fusion method to perform best, with accuracy of 93% compared to 81% for IMU and 77% for visual sensing. Development of sensor fusion methods such as these is key to reliable Human Machine Interaction (HMI

OPUS

Recognition of human activity and the state of an assembly task using vision and inertial sensor fusion methods

Author: Male James
Martinez Hernandez Uriel
Publication venue: IEEE
Publication date: 18/06/2021
Field of study

OPUS

Non-contact Multimodal Indoor Human Monitoring Systems: A Survey

Author: Casado Constantino Álvarez
Cañellas Manuel Lage
Jayagopi Dinesh Babu
López Miguel Bordallo
Mukherjee Anirban
Nguyen Le Ngu
Olli~Silvén
Susarla Praneeth
Wu Xiaoting
Publication venue
Publication date: 11/12/2023
Field of study

Indoor human monitoring systems leverage a wide range of sensors, including cameras, radio devices, and inertial measurement units, to collect extensive data from users and the environment. These sensors contribute diverse data modalities, such as video feeds from cameras, received signal strength indicators and channel state information from WiFi devices, and three-axis acceleration data from inertial measurement units. In this context, we present a comprehensive survey of multimodal approaches for indoor human monitoring systems, with a specific focus on their relevance in elderly care. Our survey primarily highlights non-contact technologies, particularly cameras and radio devices, as key components in the development of indoor human monitoring systems. Throughout this article, we explore well-established techniques for extracting features from multimodal data sources. Our exploration extends to methodologies for fusing these features and harnessing multiple modalities to improve the accuracy and robustness of machine learning models. Furthermore, we conduct comparative analysis across different data modalities in diverse human monitoring tasks and undertake a comprehensive examination of existing multimodal datasets. This extensive survey not only highlights the significance of indoor human monitoring systems but also affirms their versatile applications. In particular, we emphasize their critical role in enhancing the quality of elderly care, offering valuable insights into the development of non-contact monitoring solutions applicable to the needs of aging populations.Comment: 19 pages, 5 figure

arXiv.org e-Print Archive

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Author: Duan Ling-Yu
Kot Alex C.
Liu Jun
Perez Mauricio
Shahroudy Amir
Wang Gang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/06/2019
Field of study

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

arXiv.org e-Print Archive

Chalmers Research

Action Recognition using Deep Convolutional Neural Networks and Compressed Spatio-Temporal Pose Encodings

Author: McNally William
McPhee John
Wong Alexander
Publication venue: University of Waterloo (Waterloo, Ontario, Canada)
Publication date: 24/12/2018
Field of study

Convolutional neural networks have recently shown proficiency atrecognizing actions in RGB video. Existing models are gener-ally very deep, requiring large amounts of data to train effectively.Moreover, they rely mainly on global appearance and could poten-tially underperform in single-environment applications, such as asports event. To overcome these limitations, we propose to short-cut spatial learning by leveraging the activations within a humanpose estimation network. The proposed framework integrates ahuman pose estimation network with a convolutional classifier viacompressed encodings of pose activations. When evaluated onUTD-MHAD, a 27-class multimodal dataset, the pose-based RGBaction recognition model achieves a classification accuracy of 98.4%in a subject-specific experiment and outperforms a baseline methodthat fuses depth and inertial sensor data

Waterloo Library Journal Publishing Service (University of Waterloo, Canada)