2,891 research outputs found

    Human activity recognition based on time series analysis using U-Net

    Full text link
    Traditional human activity recognition (HAR) based on time series adopts sliding window analysis method. This method faces the multi-class window problem which mistakenly labels different classes of sampling points within a window as a class. In this paper, a HAR algorithm based on U-Net is proposed to perform activity labeling and prediction at each sampling point. The activity data of the triaxial accelerometer is mapped into an image with the single pixel column and multi-channel which is input into the U-Net network for training and recognition. Our proposal can complete the pixel-level gesture recognition function. The method does not need manual feature extraction and can effectively identify short-term behaviors in long-term activity sequences. We collected the Sanitation dataset and tested the proposed scheme with four open data sets. The experimental results show that compared with Support Vector Machine (SVM), k-Nearest Neighbor (kNN), Decision Tree(DT), Quadratic Discriminant Analysis (QDA), Convolutional Neural Network (CNN) and Fully Convolutional Networks (FCN) methods, our proposal has the highest accuracy and F1-socre in each dataset, and has stable performance and high robustness. At the same time, after the U-Net has finished training, our proposal can achieve fast enough recognition speed.Comment: 21 page

    Compact Deep Neural Networks for Computationally Efficient Gesture Classification From Electromyography Signals

    Full text link
    Machine learning classifiers using surface electromyography are important for human-machine interfacing and device control. Conventional classifiers such as support vector machines (SVMs) use manually extracted features based on e.g. wavelets. These features tend to be fixed and non-person specific, which is a key limitation due to high person-to-person variability of myography signals. Deep neural networks, by contrast, can automatically extract person specific features - an important advantage. However, deep neural networks typically have the drawback of large numbers of parameters, requiring large training data sets and powerful hardware not suited to embedded systems. This paper solves these problems by introducing a compact deep neural network architecture that is much smaller than existing counterparts. The performance of the compact deep net is benchmarked against an SVM and compared to other contemporary architectures across 10 human subjects, comparing Myo and Delsys Trigno electrode sets. The accuracy of the compact deep net was found to be 84.2 +/- 6% versus 70.5 +/- 7% for the SVM on the Myo, and 80.3+/- 7% versus 67.8 +/- 9% for the Delsys system, demonstrating the superior effectiveness of the proposed compact network, which had just 5,889 parameters - orders of magnitude less than some contemporary alternatives in this domain while maintaining better performance.Comment: IEEE BioRob 201

    Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks

    Full text link
    Facial expression recognition in videos is an active area of research in computer vision. However, fake facial expressions are difficult to be recognized even by humans. On the other hand, facial micro-expressions generally represent the actual emotion of a person, as it is a spontaneous reaction expressed through human face. Despite of a few attempts made for recognizing micro-expressions, still the problem is far from being a solved problem, which is depicted by the poor rate of accuracy shown by the state-of-the-art methods. A few CNN based approaches are found in the literature to recognize micro-facial expressions from still images. Whereas, a spontaneous micro-expression video contains multiple frames that have to be processed together to encode both spatial and temporal information. This paper proposes two 3D-CNN methods: MicroExpSTCNN and MicroExpFuseNet, for spontaneous facial micro-expression recognition by exploiting the spatiotemporal information in CNN framework. The MicroExpSTCNN considers the full spatial information, whereas the MicroExpFuseNet is based on the 3D-CNN feature fusion of the eyes and mouth regions. The experiments are performed over CAS(ME)^2 and SMIC micro-expression databases. The proposed MicroExpSTCNN model outperforms the state-of-the-art methods.Comment: Accepted in 2019 International Joint Conference on Neural Networks (IJCNN

    Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

    Full text link
    In this paper, we present a novel approach to automatic 3D Facial Expression Recognition (FER) based on deep representation of facial 3D geometric and 2D photometric attributes. A 3D face is firstly represented by its geometric and photometric attributes, including the geometry map, normal maps, normalized curvature map and texture map. These maps are then fed into a pre-trained deep convolutional neural network to generate the deep representation. Then the facial expression prediction is simplyachieved by training linear SVMs over the deep representation for different maps and fusing these SVM scores. The visualizations show that the deep representation provides a complete and highly discriminative coding scheme for 3D faces. Comprehensive experiments on the BU-3DFE database demonstrate that the proposed deep representation can outperform the widely used hand-crafted descriptors (i.e., LBP, SIFT, HOG, Gabor) and the state-of-art approaches under the same experimental protocols

    Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN

    Full text link
    When we say a person is texting, can you tell the person is walking or sitting? Emphatically, no. In order to solve this incomplete representation problem, this paper presents a sub-action descriptor for detailed action detection. The sub-action descriptor consists of three levels: the posture, the locomotion, and the gesture level. The three levels give three sub-action categories for one action to address the representation problem. The proposed action detection model simultaneously localizes and recognizes the actions of multiple individuals in video surveillance using appearance-based temporal features with multi-CNN. The proposed approach achieved a mean average precision (mAP) of 76.6% at the frame-based and 83.5% at the video-based measurement on the new large-scale ICVL video surveillance dataset that the authors introduce and make available to the community with this paper. Extensive experiments on the benchmark KTH dataset demonstrate that the proposed approach achieved better performance, which in turn boosts the action recognition performance over the state-of-the-art. The action detection model can run at around 25 fps on the ICVL and more than 80 fps on the KTH dataset, which is suitable for real-time surveillance applications.Comment: 29 pages, 16 figure

    GestARLite: An On-Device Pointing Finger Based Gestural Interface for Smartphones and Video See-Through Head-Mounts

    Full text link
    Hand gestures form an intuitive means of interaction in Mixed Reality (MR) applications. However, accurate gesture recognition can be achieved only through state-of-the-art deep learning models or with the use of expensive sensors. Despite the robustness of these deep learning models, they are generally computationally expensive and obtaining real-time performance on-device is still a challenge. To this end, we propose a novel lightweight hand gesture recognition framework that works in First Person View for wearable devices. The models are trained on a GPU machine and ported on an Android smartphone for its use with frugal wearable devices such as the Google Cardboard and VR Box. The proposed hand gesture recognition framework is driven by a cascade of state-of-the-art deep learning models: MobileNetV2 for hand localisation, our custom fingertip regression architecture followed by a Bi-LSTM model for gesture classification. We extensively evaluate the framework on our EgoGestAR dataset. The overall framework works in real-time on mobile devices and achieves a classification accuracy of 80% on EgoGestAR video dataset with an average latency of only 0.12 s.Comment: The AAAI 2019 Workshop on Plan, Activity, and Intent Recognition. arXiv admin note: substantial text overlap with arXiv:1904.0612

    A Real-time Hand Gesture Recognition and Human-Computer Interaction System

    Full text link
    In this project, we design a real-time human-computer interaction system based on hand gesture. The whole system consists of three components: hand detection, gesture recognition and human-computer interaction (HCI) based on recognition; and realizes the robust control of mouse and keyboard events with a higher accuracy of gesture recognition. Specifically, we use the convolutional neural network (CNN) to recognize gestures and makes it attainable to identify relatively complex gestures using only one cheap monocular camera. We introduce the Kalman filter to estimate the hand position based on which the mouse cursor control is realized in a stable and smooth way. During the HCI stage, we develop a simple strategy to avoid the false recognition caused by noises - mostly transient, false gestures, and thus to improve the reliability of interaction. The developed system is highly extendable and can be used in human-robotic or other human-machine interaction scenarios with more complex command formats rather than just mouse and keyboard events

    BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

    Full text link
    Recurrent neural networks (RNNs) have shown promising results in audio and speech processing applications due to their strong capabilities in modelling sequential data. In many applications, RNNs tend to outperform conventional models based on GMM/UBMs and i-vectors. Increasing popularity of IoT devices makes a strong case for implementing RNN based inferences for applications such as acoustics based authentication, voice commands, and edge analytics for smart homes. Nonetheless, the feasibility and performance of RNN based inferences on resources-constrained IoT devices remain largely unexplored. In this paper, we investigate the feasibility of using RNNs for an end-to-end authentication system based on breathing acoustics. We evaluate the performance of RNN models on three types of devices; smartphone, smartwatch, and Raspberry Pi and show that unlike CNN models, RNN models can be easily ported onto resource-constrained devices without a significant loss in accuracy

    Modeling of Facial Aging and Kinship: A Survey

    Full text link
    Computational facial models that capture properties of facial cues related to aging and kinship increasingly attract the attention of the research community, enabling the development of reliable methods for age progression, age estimation, age-invariant facial characterization, and kinship verification from visual data. In this paper, we review recent advances in modeling of facial aging and kinship. In particular, we provide an up-to date, complete list of available annotated datasets and an in-depth analysis of geometric, hand-crafted, and learned facial representations that are used for facial aging and kinship characterization. Moreover, evaluation protocols and metrics are reviewed and notable experimental results for each surveyed task are analyzed. This survey allows us to identify challenges and discuss future research directions for the development of robust facial models in real-world conditions

    Cross-Country Skiing Gears Classification using Deep Learning

    Full text link
    Human Activity Recognition has witnessed a significant progress in the last decade. Although a great deal of work in this field goes in recognizing normal human activities, few studies focused on identifying motion in sports. Recognizing human movements in different sports has high impact on understanding the different styles of humans in the play and on improving their performance. As deep learning models proved to have good results in many classification problems, this paper will utilize deep learning to classify cross-country skiing movements, known as gears, collected using a 3D accelerometer. It will also provide a comparison between different deep learning models such as convolutional and recurrent neural networks versus standard multi-layer perceptron. Results show that deep learning is more effective and has the highest classification accuracy.Comment: 15 pages, 8 figures, 1 tabl
    • …
    corecore