818 research outputs found
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
The vast proliferation of sensor devices and Internet of Things enables the
applications of sensor-based activity recognition. However, there exist
substantial challenges that could influence the performance of the recognition
system in practical scenarios. Recently, as deep learning has demonstrated its
effectiveness in many areas, plenty of deep methods have been investigated to
address the challenges in activity recognition. In this study, we present a
survey of the state-of-the-art deep learning methods for sensor-based human
activity recognition. We first introduce the multi-modality of the sensory data
and provide information for public datasets that can be used for evaluation in
different challenge tasks. We then propose a new taxonomy to structure the deep
methods by challenges. Challenges and challenge-related deep methods are
summarized and analyzed to form an overview of the current research progress.
At the end of this work, we discuss the open issues and provide some insights
for future directions
Hierarchical Attention Network for Action Segmentation
The temporal segmentation of events is an essential task and a precursor for
the automatic recognition of human actions in the video. Several attempts have
been made to capture frame-level salient aspects through attention but they
lack the capacity to effectively map the temporal relationships in between the
frames as they only capture a limited span of temporal dependencies. To this
end we propose a complete end-to-end supervised learning approach that can
better learn relationships between actions over time, thus improving the
overall segmentation performance. The proposed hierarchical recurrent attention
framework analyses the input video at multiple temporal scales, to form
embeddings at frame level and segment level, and perform fine-grained action
segmentation. This generates a simple, lightweight, yet extremely effective
architecture for segmenting continuous video streams and has multiple
application domains. We evaluate our system on multiple challenging public
benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech
Egocentric datasets, and achieves state-of-the-art performance. The evaluated
datasets encompass numerous video capture settings which are inclusive of
static overhead camera views and dynamic, ego-centric head-mounted camera
views, demonstrating the direct applicability of the proposed framework in a
variety of settings.Comment: Published in Pattern Recognition Letter
Multi-sensor human action recognition with particular application to tennis event-based indexing
The ability to automatically classify human actions and activities using vi- sual sensors or by analysing body worn sensor data has been an active re- search area for many years. Only recently with advancements in both fields and the ubiquitous nature of low cost sensors in our everyday lives has auto- matic human action recognition become a reality. While traditional sports coaching systems rely on manual indexing of events from a single modality, such as visual or inertial sensors, this thesis investigates the possibility of cap- turing and automatically indexing events from multimodal sensor streams. In this work, we detail a novel approach to infer human actions by fusing multimodal sensors to improve recognition accuracy. State of the art visual action recognition approaches are also investigated. Firstly we apply these action recognition detectors to basic human actions in a non-sporting con- text. We then perform action recognition to infer tennis events in a tennis court instrumented with cameras and inertial sensing infrastructure. The system proposed in this thesis can use either visual or inertial sensors to au- tomatically recognise the main tennis events during play. A complete event retrieval system is also presented to allow coaches to build advanced queries, which existing sports coaching solutions cannot facilitate, without an inordi- nate amount of manual indexing. The event retrieval interface is evaluated against a leading commercial sports coaching tool in terms of both usability and efficiency
User-adaptive models for activity and emotion recognition using deep transfer learning and data augmentation
Kan bare brukes i forskningssammenheng, ikke kommersielt. Les mer her: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-termsBuilding predictive models for human-interactive systems is a challenging task. Every individual has unique characteristics and behaviors. A generic human–machine system will not perform equally well for each user given the between-user differences. Alternatively, a system built specifically for each particular user will perform closer to the optimum. However, such a system would require more training data for every specific user, thus hindering its applicability for real-world scenarios. Collecting training data can be time consuming and expensive. For example, in clinical applications it can take weeks or months until enough data is collected to start training machine learning models. End users expect to start receiving quality feedback from a given system as soon as possible without having to rely on time consuming calibration and training procedures. In this work, we build and test user-adaptive models (UAM) which are predictive models that adapt to each users’ characteristics and behaviors with reduced training data. Our UAM are trained using deep transfer learning and data augmentation and were tested on two public datasets. The first one is an activity recognition dataset from accelerometer data. The second one is an emotion recognition dataset from speech recordings. Our results show that the UAM have a significant increase in recognition performance with reduced training data with respect to a general model. Furthermore, we show that individual characteristics such as gender can influence the models’ performance.acceptedVersio
Context Awareness for Navigation Applications
This thesis examines the topic of context awareness for navigation applications and asks the question, “What are the benefits and constraints of introducing context awareness in navigation?” Context awareness can be defined as a computer’s ability to understand the situation or context in which it is operating. In particular, we are interested in how context awareness can be used to understand the navigation needs of people using mobile computers, such as smartphones, but context awareness can also benefit other types of navigation users, such as maritime navigators. There are countless other potential applications of context awareness, but this thesis focuses on applications related to navigation. For example, if a smartphone-based navigation system can understand when a user is walking, driving a car, or riding a train, then it can adapt its navigation algorithms to improve positioning performance.
We argue that the primary set of tools available for generating context awareness is
machine learning. Machine learning is, in fact, a collection of many different algorithms and techniques for developing “computer systems that automatically improve their performance through experience” [1]. This thesis examines systematically the ability of existing algorithms from machine learning to endow computing systems with context awareness. Specifically, we apply machine learning techniques to tackle three different tasks related to context awareness and having applications in the field of navigation:
(1) to recognize the activity of a smartphone user in an indoor office environment,
(2) to recognize the mode of motion that a smartphone user is undergoing outdoors, and
(3) to determine the optimal path of a ship traveling through ice-covered waters. The
diversity of these tasks was chosen intentionally to demonstrate the breadth of problems encompassed by the topic of context awareness.
During the course of studying context awareness, we adopted two conceptual “frameworks,” which we find useful for the purpose of solidifying the abstract concepts of context and context awareness. The first such framework is based strongly on the writings of a rhetorician from Hellenistic Greece, Hermagoras of Temnos, who defined seven elements of “circumstance”. We adopt these seven elements to describe contextual information. The second framework, which we dub the “context pyramid” describes the processing of raw sensor data into contextual information in terms of six different levels. At the top of the pyramid is “rich context”, where the information is expressed in prose, and the goal for the computer is to mimic the way that a human would describe a situation.
We are still a long way off from computers being able to match a human’s ability to
understand and describe context, but this thesis improves the state-of-the-art in context awareness for navigation applications. For some particular tasks, machine learning has succeeded in outperforming humans, and in the future there are likely to be tasks in navigation where computers outperform humans. One example might be the route optimization task described above. This is an example of a task where many different types of information must be fused in non-obvious ways, and it may be that computer algorithms can find better routes through ice-covered waters than even well-trained human navigators. This thesis provides only preliminary evidence of this possibility, and future work is needed to further develop the techniques outlined here. The same can be said of the other two navigation-related tasks examined in this thesis
Timestamp-supervised Wearable-based Activity Segmentation and Recognition with Contrastive Learning and Order-Preserving Optimal Transport
Human activity recognition (HAR) with wearables is one of the serviceable
technologies in ubiquitous and mobile computing applications. The
sliding-window scheme is widely adopted while suffering from the multi-class
windows problem. As a result, there is a growing focus on joint segmentation
and recognition with deep-learning methods, aiming at simultaneously dealing
with HAR and time-series segmentation issues. However, obtaining the full
activity annotations of wearable data sequences is resource-intensive or
time-consuming, while unsupervised methods yield poor performance. To address
these challenges, we propose a novel method for joint activity segmentation and
recognition with timestamp supervision, in which only a single annotated sample
is needed in each activity segment. However, the limited information of sparse
annotations exacerbates the gap between recognition and segmentation tasks,
leading to sub-optimal model performance. Therefore, the prototypes are
estimated by class-activation maps to form a sample-to-prototype contrast
module for well-structured embeddings. Moreover, with the optimal transport
theory, our approach generates the sample-level pseudo-labels that take
advantage of unlabeled data between timestamp annotations for further
performance improvement. Comprehensive experiments on four public HAR datasets
demonstrate that our model trained with timestamp supervision is superior to
the state-of-the-art weakly-supervised methods and achieves comparable
performance to the fully-supervised approaches.Comment: Under Review (submitted to IEEE TMC
- …