14 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Indian Sign Language Recognition through Hybrid ConvNet-LSTM Networks

    Get PDF
    Dynamic hand gesture recognition is a challenging task of Human-Computer Interaction (HCI) and Computer Vision. The potential application areas of gesture recognition include sign language translation, video gaming, video surveillance, robotics, and gesture-controlled home appliances. In the proposed research, gesture recognition is applied to recognize sign language words from real-time videos. Classifying the actions from video sequences requires both spatial and temporal features. The proposed system handles the former by the Convolutional Neural Network (CNN), which is the core of several computer vision solutions and the latter by the Recurrent Neural Network (RNN), which is more efficient in handling the sequences of movements. Thus, the real-time Indian sign language (ISL) recognition system is developed using the hybrid CNN-RNN architecture. The system is trained with the proposed CasTalk-ISL dataset. The ultimate purpose of the presented research is to deploy a real-time sign language translator to break the hurdles present in the communication between hearing-impaired people and normal people. The developed system achieves 95.99% top-1 accuracy and 99.46% top-3 accuracy on the test dataset. The obtained results outperform the existing approaches using various deep models on different datasets

    SFINGE 3D: A novel benchmark for online detection and recognition of heterogeneous hand gestures from 3D fingers' trajectories

    Get PDF
    In recent years gesture recognition has become an increasingly interesting topic for both research and industry. While interaction with a device through a gestural interface is a promising idea in several applications especially in the industrial field, some of the issues related to the task are still considered a challenge. In the scientific literature, a relevant amount of work has been recently presented on the problem of detecting and classifying gestures from 3D hands' joints trajectories that can be captured by cheap devices installed on head-mounted displays and desktop computers. The methods proposed so far can achieve very good results on benchmarks requiring the offline supervised classification of segmented gestures of a particular kind but are not usually tested on the more realistic task of finding gestures execution within a continuous hand tracking session.In this paper, we present a novel benchmark, SFINGE 3D, aimed at evaluating online gesture detection and recognition. The dataset is composed of a dictionary of 13 segmented gestures used as a training set and 72 trajectories each containing 3-5 of the 13 gestures, performed in continuous tracking, padded with random hand movements acting as noise. The presented dataset, captured with a head-mounted Leap Motion device, is particularly suitable to evaluate gesture detection methods in a realistic use-case scenario, as it allows the analysis of online detection performance on heterogeneous gestures, characterized by static hand pose, global hand motions, and finger articulation.We exploited SFINGE 3D to compare two different approaches for the online detection and classification, one based on visual rendering and Convolutional Neural Networks and the other based on geometrybased handcrafted features and dissimilarity-based classifiers. We discuss the results, analyzing strengths and weaknesses of the methods, and deriving useful hints for their improvement. (C) 2020 Elsevier Ltd. All rights reserved

    Controlling Media Player with Hands: A Transformer Approach and a Quality of Experience Assessment

    Get PDF
    In this article, we propose a Hand Gesture Recognition (HGR) system based on a novel deep transformer (DT) neural network for media player control. The extracted hand skeleton features are processed by separate transformers for each finger in isolation to better identify the finger characteristics to drive the following classification. The achieved HGR accuracy (0.853) outperforms state-of-the-art HGR approaches when tested on the popular NVIDIA dataset. Moreover, we conducted a subjective assessment involving 30 people to evaluate the Quality of Experience (QoE) provided by the proposed DT-HGR for controlling a media player application compared with two traditional input devices, i.e., mouse and keyboard. The assessment participants were asked to evaluate objective (accuracy) and subjective (physical fatigue, usability, pragmatic quality, and hedonic quality) measurements. We found that (i) the accuracy of DT-HGR is very high (91.67%), only slightly lower than that of traditional alternative interaction modalities; and that (ii) the perceived quality for DT-HGR in terms of satisfaction, comfort, and interactivity is very high, with an average Mean Opinion Score (MOS) value as high as 4.4, whereas the alternative approaches did not reach 3.8, which encourages a more pervasive adoption of the natural gesture interaction

    On the role of gestures in human-robot interaction

    Get PDF
    This thesis investigates the gestural interaction problem and in particular the usage of gestures for human-robot interaction. The lack of a clear definition of the problem statement and a common terminology resulted in a fragmented field of research where building upon prior work is rare. The scope of the research presented in this thesis, therefore, consists in laying the foundation to help the community to build a more homogeneous research field. The main contributions of this thesis are twofold: (i) a taxonomy to define gestures; and (ii) an ingegneristic definition of the gestural interaction problem. The contributions resulted is a schema to represent the existing literature in a more organic way, helping future researchers to identify existing technologies and applications, also thanks to an extensive literature review. Furthermore, the defined problem has been studied in two of its specialization: (i) direct control and (ii) teaching of a robotic manipulator, which leads to the development of technological solutions for gesture sensing, detection and classification, which can possibly be applied to other contexts

    Deep Recurrent Networks for Gesture Recognition and Synthesis

    Get PDF
    It is hard to overstate the importance of gesture-based interfaces in many applications nowadays. The adoption of such interfaces stems from the opportunities they create for incorporating natural and fluid user interactions. This highlights the importance of having gesture recognizers that are not only accurate but also easy to adopt. The ever-growing popularity of machine learning has prompted many application developers to integrate automatic methods of recognition into their products. On the one hand, deep learning often tops the list of the most powerful and robust recognizers. These methods have been consistently shown to outperform all other machine learning methods in a variety of tasks. On the other hand, deep networks can be overwhelming to use for a majority of developers, requiring a lot of tuning and tweaking to work as expected. Additionally, these networks are infamous for their requirement for large amounts of training data, further hampering their adoption in scenarios where labeled data is limited. In this dissertation, we aim to bridge the gap between the power of deep learning methods and their adoption into gesture recognition workflows. To this end, we introduce two deep network models for recognition. These models are similar in spirit, but target different application domains: one is designed for segmented gesture recognition, while the other is suitable for continuous data, tackling segmentation and recognition problems simultaneously. The distinguishing characteristic of these networks is their simplicity, small number of free parameters, and their use of common building blocks that come standard with any modern deep learning framework, making them easy to implement, train and adopt. Through evaluations, we show that our proposed models achieve state-of-the-art results in various recognition tasks and application domains spanning different input devices and interaction modalities. We demonstrate that the infamy of deep networks due to their demand for powerful hardware as well as large amounts of data is an unfair assessment. On the contrary, we show that in the absence of such data, our proposed models can be quickly trained while achieving competitive recognition accuracy. Next, we explore the problem of synthetic gesture generation: a measure often taken to address the shortage of labeled data. We extend our proposed recognition models and demonstrate that the same models can be used in a Generative Adversarial Network (GAN) architecture for synthetic gesture generation. Specifically, we show that our original recognizer can be used as the discriminator in such frameworks, while its slightly modified version can act as the gesture generator. We then formulate a novel loss function for our gesture generator, which entirely replaces the need for a discriminator network in our generative model, thereby significantly reducing the complexity of our framework. Through evaluations, we show that our model is able to improve the recognition accuracy of multiple recognizers across a variety of datasets. Through user studies, we additionally show that human evaluators mistake our synthetic samples with the real ones frequently indicating that our synthetic samples are visually realistic. Additional resources for this dissertation (such as demo videos and public source codes) are available at https://www.maghoumi.com/dissertatio

    Human behavior understanding for worker-centered intelligent manufacturing

    Get PDF
    “In a worker-centered intelligent manufacturing system, sensing and understanding of the worker’s behavior are the primary tasks, which are essential for automatic performance evaluation & optimization, intelligent training & assistance, and human-robot collaboration. In this study, a worker-centered training & assistant system is proposed for intelligent manufacturing, which is featured with self-awareness and active-guidance. To understand the hand behavior, a method is proposed for complex hand gesture recognition using Convolutional Neural Networks (CNN) with multiview augmentation and inference fusion, from depth images captured by Microsoft Kinect. To sense and understand the worker in a more comprehensive way, a multi-modal approach is proposed for worker activity recognition using Inertial Measurement Unit (IMU) signals obtained from a Myo armband and videos from a visual camera. To automatically learn the importance of different sensors, a novel attention-based approach is proposed to human activity recognition using multiple IMU sensors worn at different body locations. To deploy the developed algorithms to the factory floor, a real-time assembly operation recognition system is proposed with fog computing and transfer learning. The proposed worker-centered training & assistant system has been validated and demonstrated the feasibility and great potential for applying to the manufacturing industry for frontline workers. Our developed approaches have been evaluated: 1) the multi-view approach outperforms the state-of-the-arts on two public benchmark datasets, 2) the multi-modal approach achieves an accuracy of 97% on a worker activity dataset including 6 activities and achieves the best performance on a public dataset, 3) the attention-based method outperforms the state-of-the-art methods on five publicly available datasets, and 4) the developed transfer learning model achieves a real-time recognition accuracy of 95% on a dataset including 10 worker operations”--Abstract, page iv