515 research outputs found
Human activity recognition from inertial sensor time-series using batch normalized deep LSTM recurrent networks
In recent years machine learning methods for human activity recognition have been found very effective. These classify discriminative features generated from raw input sequences acquired from body-worn inertial sensors. However, it involves an explicit feature extraction stage from the raw data, and although human movements are encoded in a sequence of successive samples in time most state-of-the-art machine learning methods do not exploit the temporal correlations between input data samples. In this paper we present a Long-Short Term Memory (LSTM) deep recurrent neural network for the classification of six daily life activities from accelerometer and gyroscope data. Results show that our LSTM can processes featureless raw input signals, and achieves 92 % average accuracy in a multi-class-scenario. Further, we show that this accuracy can be achieved with almost four times fewer training epochs by using a batch normalization approach
Two-Stream RNN/CNN for Action Recognition in 3D Videos
The recognition of actions from video sequences has many applications in
health monitoring, assisted living, surveillance, and smart homes. Despite
advances in sensing, in particular related to 3D video, the methodologies to
process the data are still subject to research. We demonstrate superior results
by a system which combines recurrent neural networks with convolutional neural
networks in a voting approach. The gated-recurrent-unit-based neural networks
are particularly well-suited to distinguish actions based on long-term
information from optical tracking data; the 3D-CNNs focus more on detailed,
recent information from video data. The resulting features are merged in an SVM
which then classifies the movement. In this architecture, our method improves
recognition rates of state-of-the-art methods by 14% on standard data sets.Comment: Published in 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS
Design and implementation of a convolutional neural network on an edge computing smartphone for human activity recognition
Edge computing aims to integrate computing into everyday settings, enabling the system to be context-aware and private to the user. With the increasing success and popularity of deep learning methods, there is an increased demand to leverage these techniques in mobile and wearable computing scenarios. In this paper, we present an assessment of a deep human activity recognition system’s memory and execution time requirements, when implemented on a mid-range smartphone class hardware and the memory implications for embedded hardware. This paper presents the design of a convolutional neural network (CNN) in the context of human activity recognition scenario. Here, layers of CNN automate the feature learning and the influence of various hyper-parameters such as the number of filters and filter size on the performance of CNN. The proposed CNN showed increased robustness with better capability of detecting activities with temporal dependence compared to models using statistical machine learning techniques. The model obtained an accuracy of 96.4% in a five-class static and dynamic activity recognition scenario. We calculated the proposed model memory consumption and execution time requirements needed for using it on a mid-range smartphone. Per-channel quantization of weights and per-layer quantization of activation to 8-bits of precision post-training produces classification accuracy within 2% of floating-point networks for dense, convolutional neural network architecture. Almost all the size and execution time reduction in the optimized model was achieved due to weight quantization. We achieved more than four times reduction in model size when optimized to 8-bit, which ensured a feasible model capable of fast on-device inference
Single Input Single Head CNN-GRU-LSTM Architecture for Recognition of Human Activities
Due to its applications for the betterment of human life, human activity recognition has attracted more researchers in the recent past. Anticipation of intension behind the motion and behaviour recognition are intensive applications for research inside human activity recognition. Gyroscope, accelerometer, and magnetometer sensors are heavily used to obtain the data in time series for every timestep. The selection of temporal features is required for the successful recognition of human motion primitives. Different data pre-processing and feature extraction techniques were used in most past approaches with the constraint of sufficient domain knowledge. These approaches are heavily dependent on the quality of handcrafted features and are also time-consuming and not generalized. In this paper, a single head deep neural network-based approach with the combination of a convolutional neural network, Gated recurrent unit, and Long Short Term memory is proposed. The raw data from wearable sensors are used with minimum pre-processing steps and without the involvement of any feature extraction method. 93.48 % and 98.51% accuracy are obtained on UCI-HAR and WISDM datasets. This single-head deep neural network-based model shows higher classification performance over other architectures under deep neural networks
A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System
This paper presents a novel multimodal human activity recognition system. It
uses a two-stream decision level fusion of vision and inertial sensors. In the
first stream, raw RGB frames are passed to a part affinity field-based pose
estimation network to detect the keypoints of the user. These keypoints are
then pre-processed and inputted in a sliding window fashion to a specially
designed convolutional neural network for the spatial feature extraction
followed by regularized LSTMs to calculate the temporal features. The outputs
of LSTM networks are then inputted to fully connected layers for
classification. In the second stream, data obtained from inertial sensors are
pre-processed and inputted to regularized LSTMs for the feature extraction
followed by fully connected layers for the classification. At this stage, the
SoftMax scores of two streams are then fused using the decision level fusion
which gives the final prediction. Extensive experiments are conducted to
evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall
detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and
95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and
C-MHAD datasets. These results are far superior than the current
state-of-the-art methods
Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition
Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation
- …