190 research outputs found

    A novel set of features for continuous hand gesture recognition

    Get PDF
    Applications requiring the natural use of the human hand as a human–computer interface motivate research on continuous hand gesture recognition. Gesture recognition depends on gesture segmentation to locate the starting and end points of meaningful gestures while ignoring unintentional movements. Unfortunately, gesture segmentation remains a formidable challenge because of unconstrained spatiotemporal variations in gestures and the coarticulation and movement epenthesis of successive gestures. Furthermore, errors in hand image segmentation cause the estimated hand motion trajectory to deviate from the actual one. This research moves toward addressing these problems. Our approach entails using gesture spotting to distinguish meaningful gestures from unintentional movements. To avoid the effects of variations in a gesture’s motion chain code (MCC), we propose instead to use a novel set of features: the (a) orientation and (b) length of an ellipse least-squares fitted to motion-trajectory points and (c) the position of the hand. The features are designed to support classification using conditional random fields. To evaluate the performance of the system, 10 participants signed 10 gestures several times each, providing a total of 75 instances per gesture. To train the system, 50 instances of each gesture served as training data and 25 as testing data. For isolated gestures, the recognition rate using the MCC as a feature vector was only 69.6 % but rose to 96.0 % using the proposed features, a 26.1 % improvement. For continuous gestures, the recognition rate for the proposed features was 88.9 %. These results show the efficacy of the proposed method

    Detection of major ASL sign types in continuous signing for ASL recognition

    Get PDF
    In American Sign Language (ASL) as well as other signed languages, different classes of signs (e.g., lexical signs, fingerspelled signs, and classifier constructions) have different internal structural properties. Continuous sign recognition accuracy can be improved through use of distinct recognition strategies, as well as different training datasets, for each class of signs. For these strategies to be applied, continuous signing video needs to be segmented into parts corresponding to particular classes of signs. In this paper we present a multiple instance learning-based segmentation system that accurately labels 91.27% of the video frames of 500 continuous utterances (including 7 different subjects) from the publicly accessible NCSLGR corpus (Neidle and Vogler, 2012). The system uses novel feature descriptors derived from both motion and shape statistics of the regions of high local motion. The system does not require a hand tracker

    An approach to sign language translation using the Intel Realsense camera

    Get PDF
    An Intel RealSense camera is used for translating static manual American Sign Language gestures into text. The system uses palm orientation and finger joint data as inputs for either a support vector machine or a neural network whose architecture has been optimized by a genetic algorithm. A data set consisting of 100 samples of 26 gestures (the letters of the alphabet) is extracted from 10 participants. When comparing the different learners in combination with different standard preprocessing techniques, the highest accuracy of 95% is achieved by a support vector machine with a scaling method, as well as principal component analysis, used for preprocessing. The highest performing neural network system reaches 92.1% but produces predictions much faster. We also present a simple software solution that uses the trained classifiers to enable user-friendly sign language translation

    Fully Convolutional Networks for Continuous Sign Language Recognition

    Full text link
    Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences. Most recent work accomplishes this by using CNN and RNN hybrid networks. However, training these networks is generally non-trivial, and most of them fail in learning unseen sequence patterns, causing an unsatisfactory performance for online recognition. In this paper, we propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences with only sentence-level annotations given. A gloss feature enhancement (GFE) module is introduced in the proposed network to enforce better sequence alignment learning. The proposed network is end-to-end trainable without any pre-training. We conduct experiments on two large scale SLR datasets. Experiments show that our method for continuous SLR is effective and performs well in online recognition.Comment: Accepted to ECCV202

    Portuguese sign language recognition from depth sensing human gesture and motion capture

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaJust like spoken languages, Sign Languages (SL) have evolved over time, featuring their own grammar and vocabulary, and thus, they are considered real languages. The major difference between SL and other languages is that the first one is signed and the second one is spoken, meaning that SL is a visual language. SL is the most common type of language among deaf people as no sense of hearing is required to understand it. The main motivation of this dissertation is to build a bridge to ease the communication between those who are deaf (and hard-of-hearing) and those not familiarized with SL. We propose a system whose main feature is the absence of intrusion, discarding the usage of glove like devices or a setup with multiple cameras. We achieved this using the Kinect One sensor from Microsoft. Using a single device, we can acquire both depth and colour information, yet this system makes usage only on the depth information. Four experimental situations have been performed: simple posture recognition, movement postures recognition, sign recognition using only hand path information, and sign recognition using hand path and hand configuration information. The first and third experimental classes were conducted, in order to confirm the feature extraction method’s eligibility while the second and fourth experiments were conducted to address our hypothesis. Accuracy rates reached 87.4% and 64.2% for the first and second experiments, respectively. In the experiments concerning signs, accuracy rates of 91.6% for hand path data only, and 81.3% for hand path and hand configuration data were achieved.Tal como as línguas faladas, as línguas gestuais evoluíram ao longo do tempo, contendo gramáticas e vocabulários próprios, sendo assim oficialmente consideradas línguas. A principal diferença entre as línguas faladas e as línguas gestuais é o meio de comunicação, sendo dessa forma as línguas gestuais línguas visuais. Sendo que a principal língua falada entre a comunidade surda é a língua gestual, construir uma ferramenta que funcione como uma ligação que facilite a comunicação entre a comunidade surda e o resto das pessoas é o principal objetivo e motivação desta dissertação. O nosso sistema tem como característica principal não ser intrusivo, descartando o uso de sistemas de “Data Gloves” ou sistemas dependentes de múltiplas câmaras ou outros aparelhos. Isto é conseguido usando um único aparelho, o Kinect One da Microsoft, que consegue captar informações de cor e profundidade. No desenvolvimento deste trabalho, quarto experiências foram realizadas: reconhecimento simples da configuração da mão; reconhecimento da configuração da mão em sinais; reconhecimento de sinais usando somente informação dos trajetos das mãos; reconhecimento de sinais com o trajeto e as configurações das mãos. A primeira e terceira experiências foram realizadas de forma a conferir o método de extração de características, enquanto a segunda e quarta experiências foram conduzidas de forma a adaptar os primeiros sistemas ao problema real do reconhecimento de sinais em LGP. A primeira e segunda experiências obtiveram taxas de acerto de 87.3% e 64.2% respetivamente enquanto as experiências respetivos ao reconhecimento de sinais obtiveram taxas de 91.6% para a experiência contendo só o trajeto da mão, e 81.3% com o trajeto e a configuração das mãos

    Acta Cybernetica : Volume 18. Number 2.

    Get PDF

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency
    corecore