518 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties

    Full text link
    Model-based approaches to 3D hand tracking have been shown to perform well in a wide range of scenarios. However, they require initialisation and cannot recover easily from tracking failures that occur due to fast hand motions. Data-driven approaches, on the other hand, can quickly deliver a solution, but the results often suffer from lower accuracy or missing anatomical validity compared to those obtained from model-based approaches. In this work we propose a hybrid approach for hand pose estimation from a single depth image. First, a learned regressor is employed to deliver multiple initial hypotheses for the 3D position of each hand joint. Subsequently, the kinematic parameters of a 3D hand model are found by deliberately exploiting the inherent uncertainty of the inferred joint proposals. This way, the method provides anatomically valid and accurate solutions without requiring manual initialisation or suffering from track losses. Quantitative results on several standard datasets demonstrate that the proposed method outperforms state-of-the-art representatives of the model-based, data-driven and hybrid paradigms.Comment: BMVC 2015 (oral); see also http://lrs.icg.tugraz.at/research/hybridhape

    Low Cost Open Source Modal Virtual Environment Interfaces Using Full Body Motion Tracking and Hand Gesture Recognition

    Get PDF
    Virtual environments provide insightful and meaningful ways to explore data sets through immersive experiences. One of the ways immersion is achieved is through natural interaction methods instead of only a keyboard and mouse. Intuitive tracking systems for natural interfaces suitable for such environments are often expensive. Recently however, devices such as gesture tracking gloves and skeletal tracking systems have emerged in the consumer market. This project integrates gestural interfaces into an open source virtual reality toolkit using consumer grade input devices and generates a set of tools to enable multimodal gestural interface creation. The AnthroTronix AcceleGlove is used to augment body tracking data from a Microsoft Kinect with fine grained hand gesture data. The tools are found to be useful as a sample gestural interface is implemented using them. The project concludes by suggesting studies targeting gestural interfaces using such devices as well as other areas for further research

    Low Cost Open Source Modal Virtual Environment Interfaces Using Full Body Motion Tracking and Hand Gesture Recognition

    Get PDF
    Virtual environments provide insightful and meaningful ways to explore data sets through immersive experiences. One of the ways immersion is achieved is through natural interaction methods instead of only a keyboard and mouse. Intuitive tracking systems for natural interfaces suitable for such environments are often expensive. Recently however, devices such as gesture tracking gloves and skeletal tracking systems have emerged in the consumer market. This project integrates gestural interfaces into an open source virtual reality toolkit using consumer grade input devices and generates a set of tools to enable multimodal gestural interface creation. The AnthroTronix AcceleGlove is used to augment body tracking data from a Microsoft Kinect with fine grained hand gesture data. The tools are found to be useful as a sample gestural interface is implemented using them. The project concludes by suggesting studies targeting gestural interfaces using such devices as well as other areas for further research

    Designing 2D Interfaces For 3D Gesture Retrieval Utilizing Deep Learning

    Get PDF
    Gesture retrieval can be defined as the process of retrieving the correct meaning of the hand movement from a pre-assembled gesture dataset. The purpose of the research discussed here is to design and implement a gesture interface system that facilitates retrieval for an American Sign Language gesture set using a mobile device. The principal challenge discussed here will be the normalization of 2D gestures generated from the mobile device interface and the 3D gestures captured from video samples into a common data structure that can be utilized by deep learning networks. This thesis covers convolutional neural networks and auto encoders which are used to transform 2D gestures into the correct form, before being classified by a convolutional neural network. The architecture and implementation of the front-end and back-end systems and each of their respective responsibilities are discussed. Lastly, this thesis covers the results of the experiment and breakdown the final classification accuracy of 83% and how this work could be further improved by using depth based videos for the 3D data

    PARLOMA – A Novel Human-Robot Interaction System for Deaf-blind Remote Communication

    Get PDF
    Deaf-blindness forces people to live in isolation. Up to now there is no existing technological solution enabling two (or many) Deaf-blind persons to communicate remotely among them in tactile Sign Language (t-SL). When resorting to t-SL, Deaf-blind persons can communicate only with persons physically present in the same place, because they are required to reciprocally explore their hands to exchange messages. We present a preliminary version of PARLOMA, a novel system to enable remote communication between Deaf-blind persons. It is composed of a low-cost depth sensor as the only input device, paired with a robotic hand as output device. Essentially, any user can perform handshapes in front of the depth sensor. The system is able to recognize a set of handshapes that are sent over the web and reproduced by an anthropomorphic robotic hand. PARLOMA can work as a “telephone” for Deaf-blind people. Hence, it will dramatically improve life quality of Deaf-blind persons. PARLOMA has been designed in strict collaboration with the main Italian Deaf-blind associations, in order to include end-users in the design phase

    Human behavior understanding for worker-centered intelligent manufacturing

    Get PDF
    “In a worker-centered intelligent manufacturing system, sensing and understanding of the worker’s behavior are the primary tasks, which are essential for automatic performance evaluation & optimization, intelligent training & assistance, and human-robot collaboration. In this study, a worker-centered training & assistant system is proposed for intelligent manufacturing, which is featured with self-awareness and active-guidance. To understand the hand behavior, a method is proposed for complex hand gesture recognition using Convolutional Neural Networks (CNN) with multiview augmentation and inference fusion, from depth images captured by Microsoft Kinect. To sense and understand the worker in a more comprehensive way, a multi-modal approach is proposed for worker activity recognition using Inertial Measurement Unit (IMU) signals obtained from a Myo armband and videos from a visual camera. To automatically learn the importance of different sensors, a novel attention-based approach is proposed to human activity recognition using multiple IMU sensors worn at different body locations. To deploy the developed algorithms to the factory floor, a real-time assembly operation recognition system is proposed with fog computing and transfer learning. The proposed worker-centered training & assistant system has been validated and demonstrated the feasibility and great potential for applying to the manufacturing industry for frontline workers. Our developed approaches have been evaluated: 1) the multi-view approach outperforms the state-of-the-arts on two public benchmark datasets, 2) the multi-modal approach achieves an accuracy of 97% on a worker activity dataset including 6 activities and achieves the best performance on a public dataset, 3) the attention-based method outperforms the state-of-the-art methods on five publicly available datasets, and 4) the developed transfer learning model achieves a real-time recognition accuracy of 95% on a dataset including 10 worker operations”--Abstract, page iv
    • …
    corecore