454 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    A Transferable Adaptive Domain Adversarial Neural Network for Virtual Reality Augmented EMG-Based Gesture Recognition

    Get PDF
    Within the field of electromyography-based (EMG) gesture recognition, disparities exist between the offline accuracy reported in the literature and the real-time usability of a classifier. This gap mainly stems from two factors: 1) The absence of a controller, making the data collected dissimilar to actual control. 2) The difficulty of including the four main dynamic factors (gesture intensity, limb position, electrode shift, and transient changes in the signal), as including their permutations drastically increases the amount of data to be recorded. Contrarily, online datasets are limited to the exact EMG-based controller used to record them, necessitating the recording of a new dataset for each control method or variant to be tested. Consequently, this paper proposes a new type of dataset to serve as an intermediate between offline and online datasets, by recording the data using a real-time experimental protocol. The protocol, performed in virtual reality, includes the four main dynamic factors and uses an EMG-independent controller to guide movements. This EMG-independent feedback ensures that the user is in-the-loop during recording, while enabling the resulting dynamic dataset to be used as an EMG-based benchmark. The dataset is comprised of 20 able-bodied participants completing three to four sessions over a period of 14 to 21 days. The ability of the dynamic dataset to serve as a benchmark is leveraged to evaluate the impact of different recalibration techniques for long-term (across-day) gesture recognition, including a novel algorithm, named TADANN. TADANN consistently and significantly (p<0.05) outperforms using fine-tuning as the recalibration technique.Comment: 10 Pages. The last three authors shared senior authorshi

    SFINGE 3D: A novel benchmark for online detection and recognition of heterogeneous hand gestures from 3D fingers' trajectories

    Get PDF
    In recent years gesture recognition has become an increasingly interesting topic for both research and industry. While interaction with a device through a gestural interface is a promising idea in several applications especially in the industrial field, some of the issues related to the task are still considered a challenge. In the scientific literature, a relevant amount of work has been recently presented on the problem of detecting and classifying gestures from 3D hands' joints trajectories that can be captured by cheap devices installed on head-mounted displays and desktop computers. The methods proposed so far can achieve very good results on benchmarks requiring the offline supervised classification of segmented gestures of a particular kind but are not usually tested on the more realistic task of finding gestures execution within a continuous hand tracking session.In this paper, we present a novel benchmark, SFINGE 3D, aimed at evaluating online gesture detection and recognition. The dataset is composed of a dictionary of 13 segmented gestures used as a training set and 72 trajectories each containing 3-5 of the 13 gestures, performed in continuous tracking, padded with random hand movements acting as noise. The presented dataset, captured with a head-mounted Leap Motion device, is particularly suitable to evaluate gesture detection methods in a realistic use-case scenario, as it allows the analysis of online detection performance on heterogeneous gestures, characterized by static hand pose, global hand motions, and finger articulation.We exploited SFINGE 3D to compare two different approaches for the online detection and classification, one based on visual rendering and Convolutional Neural Networks and the other based on geometrybased handcrafted features and dissimilarity-based classifiers. We discuss the results, analyzing strengths and weaknesses of the methods, and deriving useful hints for their improvement. (C) 2020 Elsevier Ltd. All rights reserved

    Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

    Full text link
    This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method

    A Transformer-Based Network for Dynamic Hand Gesture Recognition

    Get PDF
    Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system

    A transferable adaptive domain adversarial neural network for virtual reality augmented EMG-Based gesture recognition

    Get PDF
    Within the field of electromyography-based (EMG) gesture recognition, disparities exist between the off line accuracy reported in the literature and the real-time usability of a classifier. This gap mainly stems from two factors: 1) The absence of a controller, making the data collected dissimilar to actual control. 2) The difficulty of including the four main dynamic factors (gesture intensity, limb position, electrode shift, and transient changes in the signal), as including their permutations drastically increases the amount of data to be recorded. Contrarily, online datasets are limited to the exact EMG-based controller used to record them, necessitating the recording of a new dataset for each control method or variant to be tested. Consequently, this paper proposes a new type of dataset to serve as an intermediate between off line and online datasets, by recording the data using a real-time experimental protocol. The protocol, performed in virtual reality, includes the four main dynamic factors and uses an EMG-independent controller to guide movements. This EMG-independent feedback ensures that the user is in-the-loop during recording, while enabling the resulting dynamic dataset to be used as an EMG-based benchmark. The dataset is comprised of 20 able-bodied participants completing three to four sessions over a period of 14 to 21 days. The ability of the dynamic dataset to serve as a benchmark is leveraged to evaluate the impact of different-recalibration techniques for long-term (across-day) gesture recognition, including a novel algorithm, named TADANN. TADANN consistently and significantly (p <; 0.05) outperforms using fine-tuning as the recalibration technique

    Indian Sign Language Recognition through Hybrid ConvNet-LSTM Networks

    Get PDF
    Dynamic hand gesture recognition is a challenging task of Human-Computer Interaction (HCI) and Computer Vision. The potential application areas of gesture recognition include sign language translation, video gaming, video surveillance, robotics, and gesture-controlled home appliances. In the proposed research, gesture recognition is applied to recognize sign language words from real-time videos. Classifying the actions from video sequences requires both spatial and temporal features. The proposed system handles the former by the Convolutional Neural Network (CNN), which is the core of several computer vision solutions and the latter by the Recurrent Neural Network (RNN), which is more efficient in handling the sequences of movements. Thus, the real-time Indian sign language (ISL) recognition system is developed using the hybrid CNN-RNN architecture. The system is trained with the proposed CasTalk-ISL dataset. The ultimate purpose of the presented research is to deploy a real-time sign language translator to break the hurdles present in the communication between hearing-impaired people and normal people. The developed system achieves 95.99% top-1 accuracy and 99.46% top-3 accuracy on the test dataset. The obtained results outperform the existing approaches using various deep models on different datasets

    Dynamic Hand Gesture Recognition Using Ultrasonic Sonar Sensors and Deep Learning

    Get PDF
    The space of hand gesture recognition using radar and sonar is dominated mostly by radar applications. In addition, the machine learning algorithms used by these systems are typically based on convolutional neural networks with some applications exploring the use of long short term memory networks. The goal of this study was to build and design a Sonar system that can classify hand gestures using a machine learning approach. Secondly, the study aims to compare convolutional neural networks to long short term memory networks as a means to classify hand gestures using sonar. A Doppler Sonar system was designed and built to be able to sense hand gestures. The Sonar system is a multi-static system containing one transmitter and three receivers. The sonar system can measure the Doppler frequency shifts caused by dynamic hand gestures. Since the system uses three receivers, three different Doppler frequency channels are measured. Three additional differential frequency channels are formed by computing the differences between the frequency of each of the receivers. These six channels are used as inputs to the deep learning models. Two different deep learning algorithms were used to classify the hand gestures; a Doppler biLSTM network [1] and a CNN [2]. Six basic hand gestures, two in each x- y- and z-axis, and two rotational hand gestures are recorded using both left and right hand at different distances. The gestures were also recorded using both left and right hands. Ten-Fold cross-validation is used to evaluate the networks' performance and classification accuracy. The LSTM was able to classify the six basic gestures with an accuracy of at least 96% but with the addition of the two rotational gestures, the accuracy drops to 47%. This result is acceptable since the basic gestures are more commonly used gestures than rotational gestures. The CNN was able to classify all the gestures with an accuracy of at least 98%. Additionally, The LSTM network is also able to classify separate left and right-hand gestures with an accuracy of 80% and The CNN with an accuracy of 83%. The study shows that CNN is the most widely used algorithm for hand gesture recognition as it can consistently classify gestures with various degrees of complexity. The study also shows that the LSTM network can also classify hand gestures with a high degree of accuracy. More experimentation, however, needs to be done in order to increase the complexity of recognisable gestures
    • …
    corecore