625,936 research outputs found
Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention
Deep neural networks, including recurrent networks, have been successfully
applied to human activity recognition. Unfortunately, the final representation
learned by recurrent networks might encode some noise (irrelevant signal
components, unimportant sensor modalities, etc.). Besides, it is difficult to
interpret the recurrent networks to gain insight into the models' behavior. To
address these issues, we propose two attention models for human activity
recognition: temporal attention and sensor attention. These two mechanisms
adaptively focus on important signals and sensor modalities. To further improve
the understandability and mean F1 score, we add continuity constraints,
considering that continuous sensor signals are more robust than discrete ones.
We evaluate the approaches on three datasets and obtain state-of-the-art
results. Furthermore, qualitative analysis shows that the attention learned by
the models agree well with human intuition.Comment: 8 pages. published in The International Symposium on Wearable
Computers (ISWC) 201
Lightweight human activity recognition for ambient assisted living
© 2023, IARIA.Ambient assisted living (AAL) systems aim to improve the safety, comfort, and quality of life for the populations with specific attention given to prolonging personal independence during later stages of life. Human activity recognition (HAR) plays a crucial role in enabling AAL systems to recognise and understand human actions. Multi-view human activity recognition (MV-HAR) techniques are particularly useful for AAL systems as they can use information from multiple sensors to capture different perspectives of human activities and can help to improve the robustness and accuracy of activity recognition. In this work, we propose a lightweight activity recognition pipeline that utilizes skeleton data from multiple perspectives to combine the advantages of both approaches and thereby enhance an assistive robot's perception of human activity. The pipeline includes data sampling, input data type, and representation and classification methods. Our method modifies a classic LeNet classification model (M-LeNet) and uses a Vision Transformer (ViT) for the classification task. Experimental evaluation on a multi-perspective dataset of human activities in the home (RH-HAR-SK) compares the performance of these two models and indicates that combining camera views can improve recognition accuracy. Furthermore, our pipeline provides a more efficient and scalable solution in the AAL context, where bandwidth and computing resources are often limited
Online Geometric Human Interaction Segmentation and Recognition
The goal of this work is the temporal localization and recognition of binary people interactions in video. Human-human interaction detection is one of the core problems in video analysis. It has many applications such as in video surveillance, video search and retrieval, human-computer interaction, and behavior analysis for safety and security. Despite the sizeable literature in the area of activity and action modeling and recognition, the vast majority of the approaches make the assumption that the beginning and the end of the video portion containing the action or the activity of interest is known. In other words, while a significant effort has been placed on the recognition, the spatial and temporal localization of activities, i.e. the detection problem, has received considerably less attention. Even more so, if the detection has to be made in an online fashion, as opposed to offline. The latter condition is imposed by almost the totality of the state-of-the-art, which makes it intrinsically unsuited for real-time processing. In this thesis, the problem of event localization and recognition is addressed in an online fashion. The main assumption is that an interaction, or an activity is modeled by a temporal sequence. One of the main challenges is the development of a modeling framework able to capture the complex variability of activities, described by high dimensional features. This is addressed by the combination of linear models with kernel methods. In particular, the parity space theory for detection, based on Euclidean geometry, is augmented to be able to work with kernels, through the use of geometric operators in Hilbert space. While this approach is general, here it is applied to the detection of human interactions. It is tested on a publicly available dataset and on a large and challenging, newly collected dataset. An extensive testing of the approach indicates that it sets a new state-of-the-art under several performance measures, and that it holds the promise to become an effective building block for the analysis in real-time of human behavior from video
- …