887 research outputs found

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thĂšse porte sur l'Ă©tude des mĂ©thodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle dĂ©veloppĂ©e par les sourds pour communiquer. Un Ă©noncĂ© en LS consiste en une sĂ©quence de signes rĂ©alisĂ©s par les mains, accompagnĂ©s d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallĂšles dans le discours. MĂȘme si les signes sont dĂ©finis dans des dictionnaires, on trouve une trĂšs grande variabilitĂ© liĂ©e au contexte lors de leur rĂ©alisation. De plus, les signes sont souvent sĂ©parĂ©s par des mouvements de co-articulation. Cette extrĂȘme variabilitĂ© et l'effet de co-articulation reprĂ©sentent un problĂšme important dans les recherches en traitement automatique de la LS. Il est donc nĂ©cessaire d'avoir de nombreuses vidĂ©os annotĂ©es en LS, si l'on veut Ă©tudier cette langue et utiliser des mĂ©thodes d'apprentissage automatique. Les annotations de vidĂ©o en LS sont rĂ©alisĂ©es manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrĂȘmement chronophage. De plus, la qualitĂ© des annotations dĂ©pend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tĂąche et reprĂ©sente un gain de temps et de robustesse. Le but de nos recherches est d'Ă©tudier des mĂ©thodes de traitement d'images afin d'assister l'annotation des corpus vidĂ©o: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thĂšse nous avons Ă©tudiĂ© un ensemble de mĂ©thodes permettant de rĂ©aliser l'annotation en glose. Dans un premier temps, nous cherchons Ă  dĂ©tecter les limites de dĂ©but et fin de signe. Cette mĂ©thode d'annotation nĂ©cessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractĂ©ristiques de mouvement et de forme de la main. D'abord nous proposons une mĂ©thode de suivi des composantes corporelles robuste aux occultations basĂ©e sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est dĂ©veloppĂ© afin d'extraire la rĂ©gion des mains mĂȘme quand elles se trouvent devant le visage. Puis, les caractĂ©ristiques de mouvement sont utilisĂ©es pour rĂ©aliser une premiĂšre segmentation temporelle des signes qui est par la suite amĂ©liorĂ©e grĂące Ă  l'utilisation de caractĂ©ristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation dĂ©tectĂ©es en milieu des signes. Une fois les signes segmentĂ©s, on procĂšde Ă  l'extraction de caractĂ©ristiques visuelles pour leur reconnaissance en termes de gloses Ă  l'aide de modĂšles phonologiques. Nous avons Ă©valuĂ© nos algorithmes Ă  l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'Ă©valuation montre la robustesse de nos mĂ©thodes par rapport Ă  la dynamique et le grand nombre d'occultations entre les diffĂ©rents membres. L'annotation rĂ©sultante est indĂ©pendante de l'annotateur et reprĂ©sente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Implementation of an Automatic Sign Language Lexical Annotation Framework based on Propositional Dynamic Logic

    Get PDF
    International audienceIn this paper, we present the implementation of an automatic Sign Language (SL) sign annotation framework based on a formal logic, the Propositional Dynamic Logic (PDL). Our system relies heavily on the use of a specific variant of PDL, the Propositional Dynamic Logic for Sign Language (PDLSL), which lets us describe SL signs as formulae and corpora videos as labeled transition systems (LTSs). Here, we intend to show how a generic annotation system can be constructed upon these underlying theoretical principles, regardless of the tracking technologies available or the input format of corpora. With this in mind, we generated a development framework that adapts the system to specific use cases. Furthermore, we present some results obtained by our application when adapted to one distinct case, 2D corpora analysis with pre-processed tracking information. We also present some insights on how such a technology can be used to analyze 3D real-time data, captured with a depth device

    Detection of major ASL sign types in continuous signing for ASL recognition

    Get PDF
    In American Sign Language (ASL) as well as other signed languages, different classes of signs (e.g., lexical signs, fingerspelled signs, and classifier constructions) have different internal structural properties. Continuous sign recognition accuracy can be improved through use of distinct recognition strategies, as well as different training datasets, for each class of signs. For these strategies to be applied, continuous signing video needs to be segmented into parts corresponding to particular classes of signs. In this paper we present a multiple instance learning-based segmentation system that accurately labels 91.27% of the video frames of 500 continuous utterances (including 7 different subjects) from the publicly accessible NCSLGR corpus (Neidle and Vogler, 2012). The system uses novel feature descriptors derived from both motion and shape statistics of the regions of high local motion. The system does not require a hand tracker

    Sign language lexical recognition with Propositional Dynamic Logic

    Get PDF
    International audienceThis paper explores the use of Propositional Dynamic Logic (PDL) as a suitable formal framework for describing Sign Language (SL), the language of deaf people, in the context of natural language processing. SLs are visual, complete, standalone languages which are just as expressive as oral languages. Signs in SL usually correspond to sequences of highly specific body postures interleaved with movements, which make reference to real world objects, characters or situations. Here we propose a formal representation of SL signs, that will help us with the analysis of automatically-collected hand tracking data from French Sign Language (FSL) video corpora. We further show how such a representation could help us with the design of computer aided SL verification tools, which in turn would bring us closer to the development of an automatic recognition system for these languages

    Toward a Motor Theory of Sign Language Perception

    Get PDF
    Researches on signed languages still strongly dissociate lin- guistic issues related on phonological and phonetic aspects, and gesture studies for recognition and synthesis purposes. This paper focuses on the imbrication of motion and meaning for the analysis, synthesis and evaluation of sign language gestures. We discuss the relevance and interest of a motor theory of perception in sign language communication. According to this theory, we consider that linguistic knowledge is mapped on sensory-motor processes, and propose a methodology based on the principle of a synthesis-by-analysis approach, guided by an evaluation process that aims to validate some hypothesis and concepts of this theory. Examples from existing studies illustrate the di erent concepts and provide avenues for future work.Comment: 12 pages Partiellement financ\'e par le projet ANR SignCo

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges
    • 

    corecore