2,891 research outputs found
Human activity recognition based on time series analysis using U-Net
Traditional human activity recognition (HAR) based on time series adopts
sliding window analysis method. This method faces the multi-class window
problem which mistakenly labels different classes of sampling points within a
window as a class. In this paper, a HAR algorithm based on U-Net is proposed to
perform activity labeling and prediction at each sampling point. The activity
data of the triaxial accelerometer is mapped into an image with the single
pixel column and multi-channel which is input into the U-Net network for
training and recognition. Our proposal can complete the pixel-level gesture
recognition function. The method does not need manual feature extraction and
can effectively identify short-term behaviors in long-term activity sequences.
We collected the Sanitation dataset and tested the proposed scheme with four
open data sets. The experimental results show that compared with Support Vector
Machine (SVM), k-Nearest Neighbor (kNN), Decision Tree(DT), Quadratic
Discriminant Analysis (QDA), Convolutional Neural Network (CNN) and Fully
Convolutional Networks (FCN) methods, our proposal has the highest accuracy and
F1-socre in each dataset, and has stable performance and high robustness. At
the same time, after the U-Net has finished training, our proposal can achieve
fast enough recognition speed.Comment: 21 page
Compact Deep Neural Networks for Computationally Efficient Gesture Classification From Electromyography Signals
Machine learning classifiers using surface electromyography are important for
human-machine interfacing and device control. Conventional classifiers such as
support vector machines (SVMs) use manually extracted features based on e.g.
wavelets. These features tend to be fixed and non-person specific, which is a
key limitation due to high person-to-person variability of myography signals.
Deep neural networks, by contrast, can automatically extract person specific
features - an important advantage. However, deep neural networks typically have
the drawback of large numbers of parameters, requiring large training data sets
and powerful hardware not suited to embedded systems. This paper solves these
problems by introducing a compact deep neural network architecture that is much
smaller than existing counterparts. The performance of the compact deep net is
benchmarked against an SVM and compared to other contemporary architectures
across 10 human subjects, comparing Myo and Delsys Trigno electrode sets. The
accuracy of the compact deep net was found to be 84.2 +/- 6% versus 70.5 +/- 7%
for the SVM on the Myo, and 80.3+/- 7% versus 67.8 +/- 9% for the Delsys
system, demonstrating the superior effectiveness of the proposed compact
network, which had just 5,889 parameters - orders of magnitude less than some
contemporary alternatives in this domain while maintaining better performance.Comment: IEEE BioRob 201
Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks
Facial expression recognition in videos is an active area of research in
computer vision. However, fake facial expressions are difficult to be
recognized even by humans. On the other hand, facial micro-expressions
generally represent the actual emotion of a person, as it is a spontaneous
reaction expressed through human face. Despite of a few attempts made for
recognizing micro-expressions, still the problem is far from being a solved
problem, which is depicted by the poor rate of accuracy shown by the
state-of-the-art methods. A few CNN based approaches are found in the
literature to recognize micro-facial expressions from still images. Whereas, a
spontaneous micro-expression video contains multiple frames that have to be
processed together to encode both spatial and temporal information. This paper
proposes two 3D-CNN methods: MicroExpSTCNN and MicroExpFuseNet, for spontaneous
facial micro-expression recognition by exploiting the spatiotemporal
information in CNN framework. The MicroExpSTCNN considers the full spatial
information, whereas the MicroExpFuseNet is based on the 3D-CNN feature fusion
of the eyes and mouth regions. The experiments are performed over CAS(ME)^2 and
SMIC micro-expression databases. The proposed MicroExpSTCNN model outperforms
the state-of-the-art methods.Comment: Accepted in 2019 International Joint Conference on Neural Networks
(IJCNN
Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition
In this paper, we present a novel approach to automatic 3D Facial Expression
Recognition (FER) based on deep representation of facial 3D geometric and 2D
photometric attributes. A 3D face is firstly represented by its geometric and
photometric attributes, including the geometry map, normal maps, normalized
curvature map and texture map. These maps are then fed into a pre-trained deep
convolutional neural network to generate the deep representation. Then the
facial expression prediction is simplyachieved by training linear SVMs over the
deep representation for different maps and fusing these SVM scores. The
visualizations show that the deep representation provides a complete and highly
discriminative coding scheme for 3D faces. Comprehensive experiments on the
BU-3DFE database demonstrate that the proposed deep representation can
outperform the widely used hand-crafted descriptors (i.e., LBP, SIFT, HOG,
Gabor) and the state-of-art approaches under the same experimental protocols
Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN
When we say a person is texting, can you tell the person is walking or
sitting? Emphatically, no. In order to solve this incomplete representation
problem, this paper presents a sub-action descriptor for detailed action
detection. The sub-action descriptor consists of three levels: the posture, the
locomotion, and the gesture level. The three levels give three sub-action
categories for one action to address the representation problem. The proposed
action detection model simultaneously localizes and recognizes the actions of
multiple individuals in video surveillance using appearance-based temporal
features with multi-CNN. The proposed approach achieved a mean average
precision (mAP) of 76.6% at the frame-based and 83.5% at the video-based
measurement on the new large-scale ICVL video surveillance dataset that the
authors introduce and make available to the community with this paper.
Extensive experiments on the benchmark KTH dataset demonstrate that the
proposed approach achieved better performance, which in turn boosts the action
recognition performance over the state-of-the-art. The action detection model
can run at around 25 fps on the ICVL and more than 80 fps on the KTH dataset,
which is suitable for real-time surveillance applications.Comment: 29 pages, 16 figure
GestARLite: An On-Device Pointing Finger Based Gestural Interface for Smartphones and Video See-Through Head-Mounts
Hand gestures form an intuitive means of interaction in Mixed Reality (MR)
applications. However, accurate gesture recognition can be achieved only
through state-of-the-art deep learning models or with the use of expensive
sensors. Despite the robustness of these deep learning models, they are
generally computationally expensive and obtaining real-time performance
on-device is still a challenge. To this end, we propose a novel lightweight
hand gesture recognition framework that works in First Person View for wearable
devices. The models are trained on a GPU machine and ported on an Android
smartphone for its use with frugal wearable devices such as the Google
Cardboard and VR Box. The proposed hand gesture recognition framework is driven
by a cascade of state-of-the-art deep learning models: MobileNetV2 for hand
localisation, our custom fingertip regression architecture followed by a
Bi-LSTM model for gesture classification. We extensively evaluate the framework
on our EgoGestAR dataset. The overall framework works in real-time on mobile
devices and achieves a classification accuracy of 80% on EgoGestAR video
dataset with an average latency of only 0.12 s.Comment: The AAAI 2019 Workshop on Plan, Activity, and Intent Recognition.
arXiv admin note: substantial text overlap with arXiv:1904.0612
A Real-time Hand Gesture Recognition and Human-Computer Interaction System
In this project, we design a real-time human-computer interaction system
based on hand gesture. The whole system consists of three components: hand
detection, gesture recognition and human-computer interaction (HCI) based on
recognition; and realizes the robust control of mouse and keyboard events with
a higher accuracy of gesture recognition. Specifically, we use the
convolutional neural network (CNN) to recognize gestures and makes it
attainable to identify relatively complex gestures using only one cheap
monocular camera. We introduce the Kalman filter to estimate the hand position
based on which the mouse cursor control is realized in a stable and smooth way.
During the HCI stage, we develop a simple strategy to avoid the false
recognition caused by noises - mostly transient, false gestures, and thus to
improve the reliability of interaction. The developed system is highly
extendable and can be used in human-robotic or other human-machine interaction
scenarios with more complex command formats rather than just mouse and keyboard
events
BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs
Recurrent neural networks (RNNs) have shown promising results in audio and
speech processing applications due to their strong capabilities in modelling
sequential data. In many applications, RNNs tend to outperform conventional
models based on GMM/UBMs and i-vectors. Increasing popularity of IoT devices
makes a strong case for implementing RNN based inferences for applications such
as acoustics based authentication, voice commands, and edge analytics for smart
homes. Nonetheless, the feasibility and performance of RNN based inferences on
resources-constrained IoT devices remain largely unexplored. In this paper, we
investigate the feasibility of using RNNs for an end-to-end authentication
system based on breathing acoustics. We evaluate the performance of RNN models
on three types of devices; smartphone, smartwatch, and Raspberry Pi and show
that unlike CNN models, RNN models can be easily ported onto
resource-constrained devices without a significant loss in accuracy
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
Cross-Country Skiing Gears Classification using Deep Learning
Human Activity Recognition has witnessed a significant progress in the last
decade. Although a great deal of work in this field goes in recognizing normal
human activities, few studies focused on identifying motion in sports.
Recognizing human movements in different sports has high impact on
understanding the different styles of humans in the play and on improving their
performance. As deep learning models proved to have good results in many
classification problems, this paper will utilize deep learning to classify
cross-country skiing movements, known as gears, collected using a 3D
accelerometer. It will also provide a comparison between different deep
learning models such as convolutional and recurrent neural networks versus
standard multi-layer perceptron. Results show that deep learning is more
effective and has the highest classification accuracy.Comment: 15 pages, 8 figures, 1 tabl
- …