3,686 research outputs found
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Computational Models for the Automatic Learning and Recognition of Irish Sign Language
This thesis presents a framework for the automatic recognition of Sign Language
sentences. In previous sign language recognition works, the issues of;
user independent recognition, movement epenthesis modeling and automatic
or weakly supervised training have not been fully addressed in a single recognition
framework. This work presents three main contributions in order to
address these issues.
The first contribution is a technique for user independent hand posture
recognition. We present a novel eigenspace Size Function feature which is
implemented to perform user independent recognition of sign language hand
postures.
The second contribution is a framework for the classification and spotting
of spatiotemporal gestures which appear in sign language. We propose a
Gesture Threshold Hidden Markov Model (GT-HMM) to classify gestures
and to identify movement epenthesis without the need for explicit epenthesis
training.
The third contribution is a framework to train the hand posture and spatiotemporal
models using only the weak supervision of sign language videos
and their corresponding text translations. This is achieved through our proposed
Multiple Instance Learning Density Matrix algorithm which automatically
extracts isolated signs from full sentences using the weak and noisy
supervision of text translations. The automatically extracted isolated samples
are then utilised to train our spatiotemporal gesture and hand posture
classifiers.
The work we present in this thesis is an important and significant contribution
to the area of natural sign language recognition as we propose a
robust framework for training a recognition system without the need for
manual labeling
Linguistically-driven framework for computationally efficient and scalable sign recognition
We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)
Portuguese sign language recognition via computer vision and depth sensor
Sign languages are used worldwide by a multitude of individuals. They are mostly used by the deaf communities and their teachers, or people associated with them by ties of friendship or family. Speakers are a minority of citizens, often segregated, and over the years not much attention has been given to this form of communication, even by the scientific community. In fact, in Computer Science there is some, but limited, research and development in this area. In the particular case of sign Portuguese Sign Language-PSL that fact is more evident and, to our knowledge there isn’t yet an efficient system to perform the automatic recognition of PSL signs. With the advent and wide spreading of devices such as depth sensors, there are new possibilities to address this problem.
In this thesis, we have specified, developed, tested and preliminary evaluated, solutions that we think will bring valuable contributions to the problem of Automatic Gesture Recognition, applied to Sign Languages, such as the case of Portuguese Sign Language.
In the context of this work, Computer Vision techniques were adapted to the case of Depth Sensors. A proper gesture taxonomy for this problem was proposed, and techniques for feature extraction, representation, storing and classification were presented. Two novel algorithms to solve the problem of real-time recognition of isolated static poses were specified, developed, tested and evaluated. Two other algorithms for isolated dynamic movements for gesture recognition (one of them novel), have been also specified, developed, tested and evaluated. Analyzed results compare well with the literature.As LÃnguas Gestuais são utilizadas em todo o Mundo por uma imensidão de indivÃduos. Trata-se na sua grande maioria de surdos e/ou mudos, ou pessoas a eles associados por laços familiares de amizade ou professores de LÃngua Gestual. Tratando-se de uma minoria, muitas vezes segregada, não tem vindo a ser dada ao longo dos anos pela comunidade cientÃfica, a devida atenção a esta forma de comunicação.
Na área das Ciências da Computação existem alguns, mas poucos trabalhos de investigação e desenvolvimento. No caso particular da LÃngua Gestual Portuguesa - LGP esse facto é ainda mais evidente não sendo nosso conhecimento a existência de um sistema eficaz e efetivo para fazer o reconhecimento automático de gestos da LGP.
Com o aparecimento ou massificação de dispositivos, tais como sensores de profundidade, surgem novas possibilidades para abordar este problema.
Nesta tese, foram especificadas, desenvolvidas, testadas e efectuada a avaliação preliminar de soluções que acreditamos que trarão valiosas contribuições para o problema do Reconhecimento Automático de Gestos, aplicado à s LÃnguas Gestuais, como é o caso da LÃngua Gestual Portuguesa.
Foram adaptadas técnicas de Visão por Computador ao caso dos Sensores de Profundidade.
Foi proposta uma taxonomia adequada ao problema, e apresentadas técnicas para a extração, representação e armazenamento de caracterÃsticas. Foram especificados, desenvolvidos, testados e avaliados dois algoritmos para resolver o problema do reconhecimento em tempo real de poses estáticas isoladas. Foram também especificados, desenvolvidos, testados e avaliados outros dois algoritmos para o Reconhecimento de Movimentos Dinâmicos Isolados de Gestos(um deles novo).Os resultados analisados são comparáveis à literatura.Las lenguas de Signos se utilizan en todo el Mundo por una multitud de personas. En su mayorÃa son personas sordas y/o mudas, o personas asociadas con ellos por vÃnculos de amistad o familiares y profesores de Lengua de Signos. Es una minorÃa de personas, a menudo segregadas, y no se ha dado en los últimos años por la comunidad cientÃfica, la atención debida a esta forma de comunicación.
En el área de Ciencias de la Computación hay alguna pero poca investigación y desarrollo. En el caso particular de la Lengua de Signos Portuguesa - LSP, no es de nuestro conocimiento la existencia de un sistema eficiente y eficaz para el reconocimiento automático.
Con la llegada en masa de dispositivos tales como Sensores de Profundidad, hay nuevas posibilidades para abordar el problema del Reconocimiento de Gestos.
En esta tesis se han especificado, desarrollado, probado y hecha una evaluación preliminar de soluciones, aplicada a las Lenguas de Signos como el caso de la Lengua de Signos Portuguesa - LSP.
Se han adaptado las técnicas de Visión por Ordenador para el caso de los Sensores de Profundidad.
Se propone una taxonomÃa apropiada para el problema y se presentan técnicas para la extracción, representación y el almacenamiento de caracterÃsticas.
Se desarrollaran, probaran, compararan y analizan los resultados de dos nuevos algoritmos para resolver el problema del Reconocimiento Aislado y Estático de Posturas. Otros dos algoritmos (uno de ellos nuevo) fueran también desarrollados, probados, comparados y analizados los resultados, para el Reconocimiento de Movimientos Dinámicos Aislados de los Gestos
A new 2D static hand gesture colour image dataset for ASL gestures
It usually takes a fusion of image processing and machine learning algorithms in order to
build a fully-functioning computer vision system for hand gesture recognition. Fortunately,
the complexity of developing such a system could be alleviated by treating the system as a
collection of multiple sub-systems working together, in such a way that they can be dealt
with in isolation. Machine learning need to feed on thousands of exemplars (e.g. images,
features) to automatically establish some recognisable patterns for all possible classes (e.g.
hand gestures) that applies to the problem domain. A good number of exemplars helps, but
it is also important to note that the efficacy of these exemplars depends on the variability
of illumination conditions, hand postures, angles of rotation, scaling and on the number of
volunteers from whom the hand gesture images were taken. These exemplars are usually
subjected to image processing first, to reduce the presence of noise and extract the important
features from the images. These features serve as inputs to the machine learning system.
Different sub-systems are integrated together to form a complete computer vision system for
gesture recognition. The main contribution of this work is on the production of the exemplars.
We discuss how a dataset of standard American Sign Language (ASL) hand gestures containing
2425 images from 5 individuals, with variations in lighting conditions and hand postures is
generated with the aid of image processing techniques. A minor contribution is given in
the form of a specific feature extraction method called moment invariants, for which the
computation method and the values are furnished with the dataset
- …