530 research outputs found
Spatial and Temporal Modeling for Human Activity Recognition from Multimodal Sequential Data
Human Activity Recognition (HAR) has been an intense research area for more than a decade. Different sensors, ranging from 2D and 3D cameras to accelerometers, gyroscopes, and magnetometers, have been employed to generate multimodal signals to detect various human activities. With the advancement of sensing technology and the popularity of mobile devices, depth cameras and wearable devices, such as Microsoft Kinect and smart wristbands, open a unprecedented opportunity to solve the challenging HAR problem by learning expressive representations from the multimodal signals recording huge amounts of daily activities which comprise a rich set of categories. Although competitive performance has been reported, existing methods focus on the statistical or spatial representation of the human activity sequence; while the internal temporal dynamics of the human activity sequence are not sufficiently exploited. As a result, they often face the challenge of recognizing visually similar activities composed of dynamic patterns in different temporal order. In addition, many model-driven methods based on sophisticated features and carefully-designed classifiers are computationally demanding and unable to scale to a large dataset. In this dissertation, we propose to address these challenges from three different perspectives; namely, 3D spatial relationship modeling, dynamic temporal quantization, and temporal order encoding. We propose a novel octree-based algorithm for computing the 3D spatial relationships between objects from a 3D point cloud captured by a Kinect sensor. A set of 26 3D spatial directions are defined to describe the spatial relationship of an object with respect to a reference object. These 3D directions are implemented as a set of spatial operators, such as AboveSouthEast and BelowNorthWest, of an event query language to query human activities in an indoor environment; for example, A person walks in the hallway from north to south. The performance is quantitatively evaluated in a public RGBD object dataset and qualitatively investigated in a live video computing platform. In order to address the challenge of temporal modeling in human action recognition, we introduce the dynamic temporal quantization, a clustering-like algorithm to quantize human action sequences of varied lengths into fixed-size quantized vectors. A two-step optimization algorithm is proposed to jointly optimize the quantization of the original sequence. In the aggregation step, frames falling into the sample segment are aggregated by max-polling and produce the quantized representation of the segment. During the assignment step, frame-segment assignment is updated according to dynamic time warping, while the temporal order of the entire sequence is preserved. The proposed technique is evaluated on three public 3D human action datasets and achieves state-of-the-art performance. Finally, we propose a novel temporal order encoding approach that models the temporal dynamics of the sequential data for human activity recognition. The algorithm encodes the temporal order of the latent patterns extracted by the subspace projection and generates a highly compact First-Take-All (FTA) feature vector representing the entire sequential data. An optimization algorithm is further introduced to learn the optimized projections in order to increase the discriminative power of the FTA feature. The compactness of the FTA feature makes it extremely efficient for human activity recognition with nearest neighbor search based on Hamming distance. Experimental results on two public human activity datasets demonstrate the advantages of the FTA feature over state-of-the-art methods in both accuracy and efficiency
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
The vast proliferation of sensor devices and Internet of Things enables the
applications of sensor-based activity recognition. However, there exist
substantial challenges that could influence the performance of the recognition
system in practical scenarios. Recently, as deep learning has demonstrated its
effectiveness in many areas, plenty of deep methods have been investigated to
address the challenges in activity recognition. In this study, we present a
survey of the state-of-the-art deep learning methods for sensor-based human
activity recognition. We first introduce the multi-modality of the sensory data
and provide information for public datasets that can be used for evaluation in
different challenge tasks. We then propose a new taxonomy to structure the deep
methods by challenges. Challenges and challenge-related deep methods are
summarized and analyzed to form an overview of the current research progress.
At the end of this work, we discuss the open issues and provide some insights
for future directions
Machine Learning for Microcontroller-Class Hardware -- A Review
The advancements in machine learning opened a new opportunity to bring
intelligence to the low-end Internet-of-Things nodes such as microcontrollers.
Conventional machine learning deployment has high memory and compute footprint
hindering their direct deployment on ultra resource-constrained
microcontrollers. This paper highlights the unique requirements of enabling
onboard machine learning for microcontroller class devices. Researchers use a
specialized model development workflow for resource-limited applications to
ensure the compute and latency budget is within the device limits while still
maintaining the desired performance. We characterize a closed-loop widely
applicable workflow of machine learning model development for microcontroller
class devices and show that several classes of applications adopt a specific
instance of it. We present both qualitative and numerical insights into
different stages of model development by showcasing several use cases. Finally,
we identify the open research challenges and unsolved questions demanding
careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
Browse-to-search
This demonstration presents a novel interactive online shopping application based on visual search technologies. When users want to buy something on a shopping site, they usually have the requirement of looking for related information from other web sites. Therefore users need to switch between the web page being browsed and other websites that provide search results. The proposed application enables users to naturally search products of interest when they browse a web page, and make their even causal purchase intent easily satisfied. The interactive shopping experience is characterized by: 1) in session - it allows users to specify the purchase intent in the browsing session, instead of leaving the current page and navigating to other websites; 2) in context - -the browsed web page provides implicit context information which helps infer user purchase preferences; 3) in focus - users easily specify their search interest using gesture on touch devices and do not need to formulate queries in search box; 4) natural-gesture inputs and visual-based search provides users a natural shopping experience. The system is evaluated against a data set consisting of several millions commercial product images. © 2012 Authors
Highly-Optimized Radar-Based Gesture Recognition System with Depthwise Expansion Module
The increasing integration of technology in our daily lives demands the development of
more convenient human–computer interaction (HCI) methods. Most of the current hand-based HCI
strategies exhibit various limitations, e.g., sensibility to variable lighting conditions and limitations
on the operating environment. Further, the deployment of such systems is often not performed
in resource-constrained contexts. Inspired by the MobileNetV1 deep learning network, this paper
presents a novel hand gesture recognition system based on frequency-modulated continuous wave
(FMCW) radar, exhibiting a higher recognition accuracy in comparison to the state-of-the-art systems.
First of all, the paper introduces a method to simplify radar preprocessing while preserving the main
information of the performed gestures. Then, a deep neural classifier with the novel Depthwise
Expansion Module based on the depthwise separable convolutions is presented. The introduced
classifier is optimized and deployed on the Coral Edge TPU board. The system defines and adopts
eight different hand gestures performed by five users, offering a classification accuracy of 98.13%
while operating in a low-power and resource-constrained environment.Electronic Components and Systems for European
Leadership Joint Undertaking under grant agreement No. 826655 (Tempo).European Union’s Horizon 2020 research and innovation programme and
Belgium, France, Germany, Switzerland, and the NetherlandsLodz University of Technology
Entity Recognition via Multimodal Sensor Fusion with Smart Phones
This thesis serves as an exploration that takes the sensors within a cell phone beyond the current state of recognition activities. Current state of the art sensor recognition processes tend to focus on recognizing user activity. Utilizing the same sensors available for user activity classification, this thesis validates the ability to gather data about entities separate from the user carrying the smart phone. With the ability to sense entities, the ability to recognize and classify a multitude of items, situations, and phenomena opens a new realm of possibilities for how devices perceive and react to their environment
- …