13 research outputs found
The impact of geometric and motion features on sign language translators
Malaysian Sign Language (MSL) recognition system is a choice of augmenting communication
between the hearing-impaired and hearing communities in Malaysia. Automatic translators can play an
important role as alternative communication method for the hearing people to understand the hearing impaired
ones. Automatic Translation using bare hands with natural gesture signing is a challenge in the field of machine
learning. Researchers have used electronic and coloured gloves to solve mainly three issues during the preprocessing
steps before the singings’ recognition stage. First issue is to differentiate the two hands from other
objects. This is referred to as hand detection. The second issue is to describe the detected hand and its motion
trajectory in very descriptive details which is referred to as feature extraction stage. The third issue is to find the
starting and ending duration of the sign (transitions between signs). This paper focuses on the second issue, the
feature extraction by studying the impact of the vector dimensions of the features. At the same time, signs with
similar attributes have been chosen to highlight the importance of features’ extraction stage. The study also
includes Hidden Markov Model (HMM) capability to differentiate between signs which have similar attributes
Analysis and Performance Comparison of the Feature Vectors in Recognition of Malaysian Sign Language
Dynamic approach for real-time skin detection
Human face and hand detection, recognition
and tracking are important research areas for many computer
interaction applications. Face and hand are considered
as human skin blobs, which fall in a compact region of
colour spaces. Limitations arise from the fact that human
skin has common properties and can be defined in various
colour spaces after applying colour normalization. The
model therefore, has to accept a wide range of colours,
making it more susceptible to noise. We have addressed
this problem and propose that the skin colour could be
defined separately for every person. This is expected to
reduce the errors. To detect human skin colour pixels and
to decrease the number of false alarms, a prior face or hand
detection model has been developed using Haar-like and
AdaBoost technique. To decrease the cost of computational
time, a fast search algorithm for skin detection is proposed.
The level of performance reached in terms of detection
accuracy and processing time allows this approach to be an
adequate choice for real-time skin blob tracking
Using Mobile Phone to Assist DHH Individuals
Past research on sign language recognition has mostly been based on physical information obtained via wearable devices or depth cameras. However, both types of devices are costly and inconvenient to carry, making it difficult to gain widespread acceptance by potential users. This research aims to use sophisticated and recently developed deep learning technology to build a recognition model for a Taiwanese version of sign language, with a limited focus on RGB images for training and recognition. It is hoped that this research, which makes use of lightweight devices such as mobile phones and webcams, will make a significant contribution to the communication needs of deaf and hard-of-hearing (DHH) individuals
Vision-based hand shape identification for sign language recognition
This thesis introduces an approach to obtain image-based hand features to accurately describe hand shapes commonly found in the American Sign Language. A hand recognition system capable of identifying 31 hand shapes from the American Sign Language was developed to identify hand shapes in a given input image or video sequence. An appearance-based approach with a single camera is used to recognize the hand shape. A region-based shape descriptor, the generic Fourier descriptor, invariant of translation, scale, and orientation, has been implemented to describe the shape of the hand. A wrist detection algorithm has been developed to remove the forearm from the hand region before the features are extracted. The recognition of the hand shapes is performed with a multi-class Support Vector Machine. Testing provided a recognition rate of approximately 84% based on widely varying testing set of approximately 1,500 images and training set of about 2,400 images. With a larger training set of approximately 2,700 images and a testing set of approximately 1,200 images, a recognition rate increased to about 88%
Detecção facial: autofaces versus antifaces
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico. Programa de PĂłs-Graduação em Engenharia ElĂ©trica.No presente trabalho, Ă© desenvolvido um estudo comparativo entre duas tĂ©cnicas de detecção facial baseadas em projeções vetoriais: Autofaces e Antifaces. O mĂ©todo de Autofaces tem sido significativamente estudado nos Ăşltimos anos, enquanto o de Antifaces Ă© ainda considerado o estado-da-arte para a detecção de objetos. Ambos os mĂ©todos sĂŁo descritos de forma detalhada e, para o mĂ©todo de Antifaces, Ă© proposto um procedimento que permite obter os detectores subĂłtimos. Ambos os mĂ©todos sĂŁo avaliados em condições idĂŞnticas de teste. Tais avaliações consideram detecções de caracterĂsticas faciais, de objetos tridimensionais e de uma face especĂfica, vista de um ângulo frontal. Finalmente, Ă© feita uma análise de sensibilidade dos mĂ©todos ao ruĂdo branco Gaussiano aditivo, a distorções no foco e a alterações na cena em que se apresenta o objeto de interesse. AtravĂ©s dos resultados obtidos, Ă© possĂvel constatar que, no mĂ©todo de Antifaces, os critĂ©rios para a determinação de algumas variáveis de projeto nĂŁo estĂŁo ainda bem estabelecidos. AlĂ©m disso, esse mĂ©todo apresenta alta seletividade durante o processo de detecção. O mĂ©todo de Autofaces possui maior capacidade de generalização e menor sensibilidade Ă adição de ruĂdo, distorções no foco e alterações na cena
The Effects of Visual Affordances and Feedback on a Gesture-based Interaction with Novice Users
This dissertation studies the roles and effects of visual affordances and feedback in a general-purpose gesture interface for novice users. Gesture interfaces are popularly viewed as intuitive and user-friendly modes of interacting with computers and robots, but they in fact introduce many challenges for users not already familiar with the system. Affordances and feedback – two fundamental building blocks of interface design – are perfectly suited to address the most important challenges and questions for novices using a gesture interface: what can they do? how do they do it? are they being understood? has anything gone wrong? Yet gesture interfaces rarely incorporate these features in a deliberate manner, and there are presently no well-adopted guidelines for designing affordances and feedback for gesture interaction, nor any clear understanding of their effects on such an interaction.
A general-purpose gesture interaction system was developed based on a virtual touchscreen paradigm, and guided by a novel gesture interaction framework. This framework clarifies the relationship between gesture interfaces and the application interfaces they support, and it provides guidance for selecting and designing appropriate affordances and feedback. Using this gesture system, a 40-person (all novices) user study was conducted to evaluate the effects on interaction performance and user satisfaction of four categories of affordances and feedback. The experimental results demonstrated that affordances indicating how to do something in a gesture interaction are more important to interaction performance than affordances indicating what can be done, and also that system status is more important than feedback acknowledging user actions. However, the experiments also showed unexpectedly high interaction performance when affordances and feedback were omitted. The explanation for this result remains an open question, though several potential causes are analyzed, and a tentative interpretation is provided.
The main contributions of this dissertation to the HRI and HCI research communities are 1) the design of a virtual touchscreen-based interface for general-purpose gesture interaction, to serve as a case study for identifying and designing affordances and feedback for gesture interfaces; 2) the method and surprising results of an evaluation of distinct affordance and feedback categories, in particular their effects on a gesture interaction with novice users; and 3) a set of guidelines and insights about the relationship between a user, a gesture interface, and a generic application interface, centered on a novel interaction framework that may be used to design and study other gesture systems. In addition to the intellectual contributions, this work is useful to the general public because it may influence how future assistive robots are designed to interact with people in various settings including search and rescue, healthcare and elderly care
Computer vision methods for unconstrained gesture recognition in the context of sign language annotation
Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency