172 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    A Survey of Applications and Human Motion Recognition with Microsoft Kinect

    Get PDF
    Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation

    Multimodal database of emotional speech, video and gestures

    Get PDF
    People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition

    Heterogeneous hand gesture recognition using 3D dynamic skeletal data

    Get PDF
    International audienceHand gestures are the most natural and intuitive non-verbal communication medium while interacting with a computer, and related research efforts have recently boosted interest. Additionally, the identifiable features of the hand pose provided by current commercial inexpensive depth cameras can be exploited in various gesture recognition based systems, especially for Human-Computer Interaction. In this paper, we focus our attention on 3D dynamic gesture recognition systems using the hand pose information. Specifically, we use the natural structure of the hand topology-called later hand skeletal data-to extract effective hand kinematic descriptors from the gesture sequence. Descriptors are then encoded in a statistical and temporal representation using respectively a Fisher kernel and a multi-level temporal pyramid. A linear SVM classifier can be applied directly on the feature vector computed over the whole presegmented gesture to perform the recognition. Furthermore, for early recognition from continuous stream, we introduced a prior gesture detection phase achieved using a binary classifier before the final gesture recognition. The proposed approach is evaluated on three hand gesture datasets containing respectively 10, 14 and 25 gestures with specific challenging tasks. Also, we conduct an experiment to assess the influence of depth-based hand pose estimation on our approach. Experimental results demonstrate the potential of the proposed solution in terms of hand gesture recognition and also for a low-latency gesture recognition. Comparative results with state-of-the-art methods are reported

    A random forest approach to segmenting and classifying gestures

    Full text link
    This thesis investigates a gesture segmentation and recognition scheme that employs a random forest classification model. A complete gesture recognition system should localize and classify each gesture from a given gesture vocabulary, within a continuous video stream. Thus, the system must determine the start and end points of each gesture in time, as well as accurately recognize the class label of each gesture. We propose a unified approach that performs the tasks of temporal segmentation and classification simultaneously. Our method trains a random forest classification model to recognize gestures from a given vocabulary, as presented in a training dataset of video plus 3D body joint locations, as well as out-of-vocabulary (non-gesture) instances. Given an input video stream, our trained model is applied to candidate gestures using sliding windows at multiple temporal scales. The class label with the highest classifier confidence is selected, and its corresponding scale is used to determine the segmentation boundaries in time. We evaluated our formulation in segmenting and recognizing gestures from two different benchmark datasets: the NATOPS dataset of 9,600 gesture instances from a vocabulary of 24 aircraft handling signals, and the CHALEARN dataset of 7,754 gesture instances from a vocabulary of 20 Italian communication gestures. The performance of our method compares favorably with state-of-the-art methods that employ Hidden Markov Models or Hidden Conditional Random Fields on the NATOPS dataset. We conclude with a discussion of the advantages of using our model

    Valoración de Movimiento del Miembro Superior Mediante Kinect: Aplicaciones

    Get PDF
    El presente trabajo presenta el desarrollo de un software que detecta, sigue, y grafica los movimientos realizados por el miembro superior de una persona, utilizando el sensor Kinect®. El software desarrollado realiza el procesamiento de los datos obtenidos con el sensor Kinect®, luego los datos procesados son utilizados para graficar las trayectorias en los 3 planos del espacio, pudiendo obtener una visualización completa del movimiento realizado por el usuario. También las excursiones angulares de cada articulación durante el movimiento del miembro superior son calculadas y graficadas. Además, se desarrolló una interfaz gráfica para que los datos y gráficas de cada usuario queden registrados, y los mismos pueden ser visualizados de una manera más amigable. El objetivo del desarrollo del software tiene por finalidad poder ser utilizado en sesiones de rehabilitación de miembro superior o evaluaciones de tareas que involucren movimiento de miembro superior en operarios de la industria. El trabajo presenta gráficas de trayectorias de distintas tareas cómo así también la evaluación de exactitud de los valores angulares calculados con respecto a valores reales.This paper presents a software that detects, tracks and plots on a graph the trajectories made by the upper limb of a person, by using the kinect® sensor. This software processes the kinect’s data and uses them for the trajectories plotting in three different planes of the 3D space, allowing a complete visualization of the movements made by the user. Also, angular movement of each joint are obtained and plotted. Main objective of this work is that the developed software could be used in upper limb rehabilitation sessions, as well as performance evaluation of different tasks involving worker’s upper limb movements in industries. The paper presents not only plots of trajectories in different tasks, but also presents an accuracy evaluation of the angular data obtained by the software.Fil: Tello, Emanuel Bienvenido. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Juan; Argentina. Universidad Nacional de San Juan. Facultad de Ingeniería. Departamento de Electrónica y Automática. Gabinete de Tecnología Médica; ArgentinaFil: Pérez Berenguer, María Elisa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Juan; Argentina. Universidad Nacional de San Juan. Facultad de Ingeniería. Departamento de Electrónica y Automática. Gabinete de Tecnología Médica; ArgentinaFil: Echenique, Ana Maria. Universidad Nacional de San Juan. Facultad de Ingeniería. Departamento de Electrónica y Automática. Gabinete de Tecnología Médica; ArgentinaFil: Mut, Vicente Antonio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet – San Juan. Instituto de Automática. Universidad Nacional de San Juan. Facultad de ingeniería. Instituto de Automática; ArgentinaFil: López Celani, Natalia Martina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Juan; Argentina. Universidad Nacional de San Juan. Facultad de Ingeniería. Departamento de Electrónica y Automática. Gabinete de Tecnología Médica; Argentin

    Deep Learning-Based Action Recognition

    Get PDF
    The classification of human action or behavior patterns is very important for analyzing situations in the field and maintaining social safety. This book focuses on recent research findings on recognizing human action patterns. Technology for the recognition of human action pattern includes the processing technology of human behavior data for learning, technology of expressing feature values ​​of images, technology of extracting spatiotemporal information of images, technology of recognizing human posture, and technology of gesture recognition. Research on these technologies has recently been conducted using general deep learning network modeling of artificial intelligence technology, and excellent research results have been included in this edition

    Personalized face and gesture analysis using hierarchical neural networks

    Full text link
    The video-based computational analyses of human face and gesture signals encompass a myriad of challenging research problems involving computer vision, machine learning and human computer interaction. In this thesis, we focus on the following challenges: a) the classification of hand and body gestures along with the temporal localization of their occurrence in a continuous stream, b) the recognition of facial expressivity levels in people with Parkinson's Disease using multimodal feature representations, c) the prediction of student learning outcomes in intelligent tutoring systems using affect signals, and d) the personalization of machine learning models, which can adapt to subject and group-specific nuances in facial and gestural behavior. Specifically, we first conduct a quantitative comparison of two approaches to the problem of segmenting and classifying gestures on two benchmark gesture datasets: a method that simultaneously segments and classifies gestures versus a cascaded method that performs the tasks sequentially. Second, we introduce a framework that computationally predicts an accurate score for facial expressivity and validate it on a dataset of interview videos of people with Parkinson's disease. Third, based on a unique dataset of videos of students interacting with MathSpring, an intelligent tutoring system, collected by our collaborative research team, we build models to predict learning outcomes from their facial affect signals. Finally, we propose a novel solution to a relatively unexplored area in automatic face and gesture analysis research: personalization of models to individuals and groups. We develop hierarchical Bayesian neural networks to overcome the challenges posed by group or subject-specific variations in face and gesture signals. We successfully validate our formulation on the problems of personalized subject-specific gesture classification, context-specific facial expressivity recognition and student-specific learning outcome prediction. We demonstrate the flexibility of our hierarchical framework by validating the utility of both fully connected and recurrent neural architectures
    • …
    corecore