3,656 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition

    Get PDF
    Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation

    Portuguese sign language recognition via computer vision and depth sensor

    Get PDF
    Sign languages are used worldwide by a multitude of individuals. They are mostly used by the deaf communities and their teachers, or people associated with them by ties of friendship or family. Speakers are a minority of citizens, often segregated, and over the years not much attention has been given to this form of communication, even by the scientific community. In fact, in Computer Science there is some, but limited, research and development in this area. In the particular case of sign Portuguese Sign Language-PSL that fact is more evident and, to our knowledge there isn’t yet an efficient system to perform the automatic recognition of PSL signs. With the advent and wide spreading of devices such as depth sensors, there are new possibilities to address this problem. In this thesis, we have specified, developed, tested and preliminary evaluated, solutions that we think will bring valuable contributions to the problem of Automatic Gesture Recognition, applied to Sign Languages, such as the case of Portuguese Sign Language. In the context of this work, Computer Vision techniques were adapted to the case of Depth Sensors. A proper gesture taxonomy for this problem was proposed, and techniques for feature extraction, representation, storing and classification were presented. Two novel algorithms to solve the problem of real-time recognition of isolated static poses were specified, developed, tested and evaluated. Two other algorithms for isolated dynamic movements for gesture recognition (one of them novel), have been also specified, developed, tested and evaluated. Analyzed results compare well with the literature.As Línguas Gestuais são utilizadas em todo o Mundo por uma imensidão de indivíduos. Trata-se na sua grande maioria de surdos e/ou mudos, ou pessoas a eles associados por laços familiares de amizade ou professores de Língua Gestual. Tratando-se de uma minoria, muitas vezes segregada, não tem vindo a ser dada ao longo dos anos pela comunidade científica, a devida atenção a esta forma de comunicação. Na área das Ciências da Computação existem alguns, mas poucos trabalhos de investigação e desenvolvimento. No caso particular da Língua Gestual Portuguesa - LGP esse facto é ainda mais evidente não sendo nosso conhecimento a existência de um sistema eficaz e efetivo para fazer o reconhecimento automático de gestos da LGP. Com o aparecimento ou massificação de dispositivos, tais como sensores de profundidade, surgem novas possibilidades para abordar este problema. Nesta tese, foram especificadas, desenvolvidas, testadas e efectuada a avaliação preliminar de soluções que acreditamos que trarão valiosas contribuições para o problema do Reconhecimento Automático de Gestos, aplicado às Línguas Gestuais, como é o caso da Língua Gestual Portuguesa. Foram adaptadas técnicas de Visão por Computador ao caso dos Sensores de Profundidade. Foi proposta uma taxonomia adequada ao problema, e apresentadas técnicas para a extração, representação e armazenamento de características. Foram especificados, desenvolvidos, testados e avaliados dois algoritmos para resolver o problema do reconhecimento em tempo real de poses estáticas isoladas. Foram também especificados, desenvolvidos, testados e avaliados outros dois algoritmos para o Reconhecimento de Movimentos Dinâmicos Isolados de Gestos(um deles novo).Os resultados analisados são comparáveis à literatura.Las lenguas de Signos se utilizan en todo el Mundo por una multitud de personas. En su mayoría son personas sordas y/o mudas, o personas asociadas con ellos por vínculos de amistad o familiares y profesores de Lengua de Signos. Es una minoría de personas, a menudo segregadas, y no se ha dado en los últimos años por la comunidad científica, la atención debida a esta forma de comunicación. En el área de Ciencias de la Computación hay alguna pero poca investigación y desarrollo. En el caso particular de la Lengua de Signos Portuguesa - LSP, no es de nuestro conocimiento la existencia de un sistema eficiente y eficaz para el reconocimiento automático. Con la llegada en masa de dispositivos tales como Sensores de Profundidad, hay nuevas posibilidades para abordar el problema del Reconocimiento de Gestos. En esta tesis se han especificado, desarrollado, probado y hecha una evaluación preliminar de soluciones, aplicada a las Lenguas de Signos como el caso de la Lengua de Signos Portuguesa - LSP. Se han adaptado las técnicas de Visión por Ordenador para el caso de los Sensores de Profundidad. Se propone una taxonomía apropiada para el problema y se presentan técnicas para la extracción, representación y el almacenamiento de características. Se desarrollaran, probaran, compararan y analizan los resultados de dos nuevos algoritmos para resolver el problema del Reconocimiento Aislado y Estático de Posturas. Otros dos algoritmos (uno de ellos nuevo) fueran también desarrollados, probados, comparados y analizados los resultados, para el Reconocimiento de Movimientos Dinámicos Aislados de los Gestos

    Time-Efficient Hybrid Approach for Facial Expression Recognition

    Get PDF
    Facial expression recognition is an emerging research area for improving human and computer interaction. This research plays a significant role in the field of social communication, commercial enterprise, law enforcement, and other computer interactions. In this paper, we propose a time-efficient hybrid design for facial expression recognition, combining image pre-processing steps and different Convolutional Neural Network (CNN) structures providing better accuracy and greatly improved training time. We are predicting seven basic emotions of human faces: sadness, happiness, disgust, anger, fear, surprise and neutral. The model performs well regarding challenging facial expression recognition where the emotion expressed could be one of several due to their quite similar facial characteristics such as anger, disgust, and sadness. The experiment to test the model was conducted across multiple databases and different facial orientations, and to the best of our knowledge, the model provided an accuracy of about 89.58% for KDEF dataset, 100% accuracy for JAFFE dataset and 71.975% accuracy for combined (KDEF + JAFFE + SFEW) dataset across these different scenarios. Performance evaluation was done by cross-validation techniques to avoid bias towards a specific set of images from a database

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Hand gesture recognition system based in computer vision and machine learning

    Get PDF
    "Lecture notes in computational vision and biomechanics series, ISSN 2212-9391, vol. 19"Hand gesture recognition is a natural way of human computer interaction and an area of very active research in computer vision and machine learning. This is an area with many different possible applications, giving users a simpler and more natural way to communicate with robots/systems interfaces, without the need for extra devices. So, the primary goal of gesture recognition research applied to Human-Computer Interaction (HCI) is to create systems, which can identify specific human gestures and use them to convey information or controlling devices. For that, vision-based hand gesture interfaces require fast and extremely robust hand detection, and gesture recognition in real time. This paper presents a solution, generic enough, with the help of machine learning algorithms, allowing its application in a wide range of human-computer interfaces, for real-time gesture recognition. Experiments carried out showed that the system was able to achieve an accuracy of 99.4% in terms of hand posture recognition and an average accuracy of 93.72% in terms of dynamic gesture recognition. To validate the proposed framework, two applications were implemented. The first one is a real-time system able to help a robotic soccer referee judge a game in real time. The prototype combines a vision-based hand gesture recognition system with a formal language definition, the Referee CommLang, into what is called the Referee Command Language Interface System (ReCLIS). The second one is a real-time system able to interpret the Portuguese Sign Language. Sign languages are not standard and universal and the grammars differ from country to country. Although the implemented prototype was only trained to recognize the vowels, it is easily extended to recognize the rest of the alphabet, being a solid foundation for the development of any vision-based sign language recognition user interface system.(undefined

    Categorization을 이용한 WiFi 기반 저복잡도 행동 인식 기법

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022.2. 전화숙.As smart homes and augmented reality (AR) become popular, the convenient human-computer interaction (HCI) methods are also attracting attention. Among them, many researchers have paid attention to gesture recognition that is simple and intuitive for humans. Camera-based and sensor-based gesture recognition have been very successful, but have limitations including privacy issues and inconvenience. On the other hand, WiFi-based gesture recognition using channel state information (CSI) does not have these limitations. However, since the WiFi signal is noisy, Deep learning (DL) models have been commonly utilized to improve the gesture recognition performance. DL models require large training data, large memory, and high computational complexity, resulting in long latencies that disrupt real-time systems. To solve this problem, support vector machines (SVMs) that require less computation and memory than powerful deep learning models can be utilized. However, the SVM shows poor performance when there are many target classes. In this paper, we propose a categorization method that can divide ten gestures into four categories. Since only two or three target gestures belong to each category, a traditional machine learning model like support vector machine (SVM) can achieve high accuracy while requiring less computation and memory consumption than the DL models. According to the experimental results, when using the SVM alone, the accuracy is about 58%. However, when used with categorization, it can improve up to 90%. Furthermore, the gesture recognition performance of the DL models can also be improved by combining the proposed categorization method if the hardware has sufficient memory and computational complexity.스마트홈과 증강현실(AR)이 보편화되면서 편리한 인간-컴퓨터 상호작용 방식도 주목받고 있다. 그 중 많은 연구자들이 인간에게 간편하고 직관적인 Gesture Recognition에 주목해 왔다. 카메라 기반 및 센서 기반 Gesture Recognition은 매우 성공적이었지만 개인 정보 보호 문제 및 불편함 등의 한계가 있다. 반면, 채널 상태 정보(CSI)를 이용한 WiFi 기반 Gesture Recognition은 이러한 제한이 없다. 그러나 WiFi 신호에 노이즈가 많기 때문에 Gesture Recognition 성능을 향상시키기 위해 딥러닝 모델이 일반적으로 활용되었다. 딥러닝 모델은 대규모 훈련 데이터와 대용량 메모리가 필요하고 높은 계산 복잡도로 인해 실시간 시스템을 방해하는 긴 지연 시간을 초래한다. 이 문제를 해결하기 위해 강력한 딥러닝 모델보다 연산과 메모리가 덜 필요한 SVM을 활용할 수 있다. 하지만 SVM은 대상 클래스가 많을수록 성능이 크게 저하되는 문제가 있었다. 따라서 gesture를 여러 범주로 나눔으로써 대상 클래스를 줄이는 범주화 방법을 제안한다. gesture segment라고 하는 gesture unit을 찾는 것이 범주화 방법의 핵심이다. 각 Gesture는 고유한 gesture segment 개수를 가지므로 gesture를 숫자로 범주화할 수 있다. 예를 들어, 밀기 및 당기기와 같은 연속 gesture에는 두 개의 segment가 있다. 첫 번째 segment는 밀기이고 두 번째 segment는 당기기이다. 사람들이 현재 gesture segment를 중지하고 다음 gesture segment를 수행할 때 segment 사이에 short pause가 발생한다. 우리는 CSI 진폭의 변동을 분석하여 이러한 short pause를 찾을 수 있음을 관찰했다. CSI의 진폭은 사람이 움직일 때 더 커지고 그 반대도 마찬가지이다. 이를 바탕으로 진폭의 변화를 이용하여 short pause를 찾고 gesture segment를 나누는 범주화 방법을 제안한다. 범주화 이후 범주에 해당하는 SVM이 CSI 데이터를 사용하여 발생한 gesture를 결정한다. 제안된 범주화 방법은 98.5%의 정확도를 보였고 최종적으로 10개의 gesture에 대해 SVM의 성능을 약 30% 향상시킬 수 있었다. 또한 우리가 제안한 시스템은 딥러닝 모델을 활용한 비교대상에 비해 훨씬 적은 메모리와 지연 시간을 필요로 한다. 이 결과는 제안된 방법이 준수한 정확도를 가지며 AP 및 IoT 장치와 같은 제한된 하드웨어에도 배포할 수 있음을 보여준다.Abstract - i Contents - iii List of Figures - iv List of Tables - v Chapter 1. Introduction - 1 Chapter 2. System Model - 4 Chapter 3. Proposed Scheme - 6 3.1 Overview - 6 3.2 Preprocessing - 8 3.3 Gesture Segmentation and Categorization - 10 3.4 Feature Extraction - 12 3.5 Classification - 13 Chapter 4. Performance Evaluation - 15 4.1 Experimental Setup - 15 4.2 Categorization Performance - 16 4.3 Overall Performance - 18 4.4 Performance comparison with baseline - 19 4.5 Effect of the channel - 21 Chapter 5. Conclusion - 22 Bibliography - 24 Abstract in Korean - 26석

    STUDY OF HAND GESTURE RECOGNITION AND CLASSIFICATION

    Get PDF
    To recognize different hand gestures and achieve efficient classification to understand static and dynamic hand movements used for communications.Static and dynamic hand movements are first captured using gesture recognition devices including Kinect device, hand movement sensors, connecting electrodes, and accelerometers. These gestures are processed using hand gesture recognition algorithms such as multivariate fuzzy decision tree, hidden Markov models (HMM), dynamic time warping framework, latent regression forest, support vector machine, and surface electromyogram. Hand movements made by both single and double hands are captured by gesture capture devices with proper illumination conditions. These captured gestures are processed for occlusions and fingers close interactions for identification of right gesture and to classify the gesture and ignore the intermittent gestures. Real-time hand gestures recognition needs robust algorithms like HMM to detect only the intended gesture. Classified gestures are then compared for the effectiveness with training and tested standard datasets like sign language alphabets and KTH datasets. Hand gesture recognition plays a very important role in some of the applications such as sign language recognition, robotics, television control, rehabilitation, and music orchestration
    corecore