4 research outputs found

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Real-time Immersive human-computer interaction based on tracking and recognition of dynamic hand gestures

    Get PDF
    With fast developing and ever growing use of computer based technologies, human-computer interaction (HCI) plays an increasingly pivotal role. In virtual reality (VR), HCI technologies provide not only a better understanding of three-dimensional shapes and spaces, but also sensory immersion and physical interaction. With the hand based HCI being a key HCI modality for object manipulation and gesture based communication, challenges are presented to provide users a natural, intuitive, effortless, precise, and real-time method for HCI based on dynamic hand gestures, due to the complexity of hand postures formed by multiple joints with high degrees-of-freedom, the speed of hand movements with highly variable trajectories and rapid direction changes, and the precision required for interaction between hands and objects in the virtual world. Presented in this thesis is the design and development of a novel real-time HCI system based on a unique combination of a pair of data gloves based on fibre-optic curvature sensors to acquire finger joint angles, a hybrid tracking system based on inertia and ultrasound to capture hand position and orientation, and a stereoscopic display system to provide an immersive visual feedback. The potential and effectiveness of the proposed system is demonstrated through a number of applications, namely, hand gesture based virtual object manipulation and visualisation, hand gesture based direct sign writing, and hand gesture based finger spelling. For virtual object manipulation and visualisation, the system is shown to allow a user to select, translate, rotate, scale, release and visualise virtual objects (presented using graphics and volume data) in three-dimensional space using natural hand gestures in real-time. For direct sign writing, the system is shown to be able to display immediately the corresponding SignWriting symbols signed by a user using three different signing sequences and a range of complex hand gestures, which consist of various combinations of hand postures (with each finger open, half-bent, closed, adduction and abduction), eight hand orientations in horizontal/vertical plans, three palm facing directions, and various hand movements (which can have eight directions in horizontal/vertical plans, and can be repetitive, straight/curve, clockwise/anti-clockwise). The development includes a special visual interface to give not only a stereoscopic view of hand gestures and movements, but also a structured visual feedback for each stage of the signing sequence. An excellent basis is therefore formed to develop a full HCI based on all human gestures by integrating the proposed system with facial expression and body posture recognition methods. Furthermore, for finger spelling, the system is shown to be able to recognise five vowels signed by two hands using the British Sign Language in real-time

    Real-time immersive human-computer interaction based on tracking and recognition of dynamic hand gestures

    Get PDF
    With fast developing and ever growing use of computer based technologies, human-computer interaction (HCI) plays an increasingly pivotal role. In virtual reality (VR), HCI technologies provide not only a better understanding of three-dimensional shapes and spaces, but also sensory immersion and physical interaction. With the hand based HCI being a key HCI modality for object manipulation and gesture based communication, challenges are presented to provide users a natural, intuitive, effortless, precise, and real-time method for HCI based on dynamic hand gestures, due to the complexity of hand postures formed by multiple joints with high degrees-of-freedom, the speed of hand movements with highly variable trajectories and rapid direction changes, and the precision required for interaction between hands and objects in the virtual world. Presented in this thesis is the design and development of a novel real-time HCI system based on a unique combination of a pair of data gloves based on fibre-optic curvature sensors to acquire finger joint angles, a hybrid tracking system based on inertia and ultrasound to capture hand position and orientation, and a stereoscopic display system to provide an immersive visual feedback. The potential and effectiveness of the proposed system is demonstrated through a number of applications, namely, hand gesture based virtual object manipulation and visualisation, hand gesture based direct sign writing, and hand gesture based finger spelling. For virtual object manipulation and visualisation, the system is shown to allow a user to select, translate, rotate, scale, release and visualise virtual objects (presented using graphics and volume data) in three-dimensional space using natural hand gestures in real-time. For direct sign writing, the system is shown to be able to display immediately the corresponding SignWriting symbols signed by a user using three different signing sequences and a range of complex hand gestures, which consist of various combinations of hand postures (with each finger open, half-bent, closed, adduction and abduction), eight hand orientations in horizontal/vertical plans, three palm facing directions, and various hand movements (which can have eight directions in horizontal/vertical plans, and can be repetitive, straight/curve, clockwise/anti-clockwise). The development includes a special visual interface to give not only a stereoscopic view of hand gestures and movements, but also a structured visual feedback for each stage of the signing sequence. An excellent basis is therefore formed to develop a full HCI based on all human gestures by integrating the proposed system with facial expression and body posture recognition methods. Furthermore, for finger spelling, the system is shown to be able to recognise five vowels signed by two hands using the British Sign Language in real-time.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Modelo de referência para desenvolvimento de artefatos de apoio ao acesso dos surdos ao audiovisual

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Ptograma de Pós-graduação em Engenharia e Gestão do ConhecimentoAs tecnologias da informação e comunicação possibilitam a participação do sujeito na sociedade do conhecimento, entretanto o tema da acessibilidade dos surdos aos conteúdos audiovisuais em meios digitais ainda demanda estudos para viabilizar sua efetiva e ampla adoção. Objetiva-se identificar e analisar as alternativas para o desenvolvimento de um modelo de referência que oriente o reuso de processos, métodos e técnicas para a produção de artefatos que promovam a acessibilidade de surdos aos conteúdos audiovisuais em plataformas digitais. A partir de uma revisão sistemática da literatura são apontados recomendações para apresentação de conteúdo audiovisual acessível ao público surdo, os requisitos que devem ser atendidos para promover as estratégias de acesso utilizadas por diferentes perfis de surdos, e enumeradas alternativas que podem apoiar estas demandas, como o uso de legendas textuais e com janela de língua de sinais. O modelo de referência contempla a produção de conteúdos a partir da tradução do material audiovisual, sendo identificadas e elaboradas recomendações para a geração de legendas em vídeo de língua de sinais ou na forma escrita. Busca-se integrar a produção destes tipos de artefatos, por meio de processos manuais ou automáticos sendo identificadas as mídias que apoiam ou são resultantes dos processos de produção de artefatos de apoio a acessibilidade. O modelo de referência é validado diante a consulta a especialistas e aplicado em uma implementação de referência de um sistema para acessibilidade com cenários de entrega na televisão digital interativa e na web. Como resultados são apresentadas as recomendações e alternativas em relação aos processos e mídias necessárias para a acessibilidade dos surdos ao audiovisual digital
    corecore