248 research outputs found

    Data-driven Synthesis of Animations of Spatially Inflected American Sign Language Verbs Using Human Data

    Full text link
    Techniques for producing realistic and understandable animations of American Sign Language (ASL) have accessibility benefits for signers with lower levels of written language literacy. Previous research in sign language animation didn’t address the specific linguistic issue of space use and verb inflection, due to a lack of sufficiently detailed and linguistically annotated ASL corpora, which is necessary for modern data-driven approaches. In this dissertation, a high-quality ASL motion capture corpus with ASL-specific linguistic structures is collected, annotated, and evaluated using carefully designed protocols and well-calibrated motion capture equipment. In addition, ASL animations are modeled, synthesized, and evaluated based on samples of ASL signs collected from native-signer animators or from signers recorded using motion capture equipment. Part I of this dissertation focuses on how an ASL corpus is collected, including unscripted ASL passages and ASL inflecting verbs, signs in which the location and orientation of the hands is influenced by the arrangement of locations in 3D space that represent entities under discussion. Native signers are recorded in a studio with motion capture equipment: cyber-gloves, body suit, head tracker, hand tracker, and eye tracker. Part II describes how ASL animation is synthesized using our corpus of ASL inflecting verbs. Specifically, mathematical models of hand movement are trained on animation data of signs produced by a native signer. This dissertation work demonstrates that mathematical models can be trained and built using movement data collected from humans. The evaluation studies with deaf native signer participants show that the verb animations synthesized from our models have similar understandability in subjective-rating and comprehension-question scores to animations produced by a human animator, or to animations driven by a human’s motion capture data. The modeling techniques in this dissertation are applicable to other types of ASL signs and to other sign languages used internationally. These models’ parameterization of sign animations can increase the repertoire of generation systems and can automate the work of humans using sign language scripting systems

    Interactive Editing in French Sign Language Dedicated to Virtual Signers: Requirements and Challenges

    Get PDF
    International audienceSigning avatars are increasingly used as an interface for communication to the deaf community. In recent years, an emerging approach uses captured data to edit and generate sign language (SL) gestures. Thanks to motion editing operations (e.g., concatenation, mixing), this method offers the possibility to compose new utterances, thus facilitating the enrichment of the original corpus, enhancing the natural look of the animation, and promoting the avatar’s acceptability. However, designing such an editing system raises many questions. In particular, manipulating existing movements does not guarantee the semantic consistency of the reconstructed actions. A solution is to insert the human operator in a loop for constructing new utterances and to incorporate within the utterance’s structure constraints that are derived from linguistic patterns. This article discusses the main requirements for the whole pipeline design of interactive virtual signers, including: (1) the creation of corpora, (2) the needed resources for motion recording, (3) the annotation process as the heart of the SL editing process, (4) the building, indexing, and querying of a motion database, (5) the virtual avatar animation by editing and composing motion segments, and (6) the conception of a dedicated user interface according to user’ knowledge and abilities. Each step is illustrated by the authors’ recent work and results from the project Sign3D, i.e., an editing system of French Sign Language (LSF) content

    TR-2015001: A Survey and Critique of Facial Expression Synthesis in Sign Language Animation

    Full text link
    Sign language animations can lead to better accessibility of information and services for people who are deaf and have low literacy skills in spoken/written languages. Due to the distinct word-order, syntax, and lexicon of the sign language from the spoken/written language, many deaf people find it difficult to comprehend the text on a computer screen or captions on a television. Animated characters performing sign language in a comprehensible way could make this information accessible. Facial expressions and other non-manual components play an important role in the naturalness and understandability of these animations. Their coordination to the manual signs is crucial for the interpretation of the signed message. Software to advance the support of facial expressions in generation of sign language animation could make this technology more acceptable for deaf people. In this survey, we discuss the challenges in facial expression synthesis and we compare and critique the state of the art projects on generating facial expressions in sign language animations. Beginning with an overview of facial expressions linguistics, sign language animation technologies, and some background on animating facial expressions, a discussion of the search strategy and criteria used to select the five projects that are the primary focus of this survey follows. This survey continues on to introduce the work from the five projects under consideration. Their contributions are compared in terms of support for specific sign language, categories of facial expressions investigated, focus range in the animation generation, use of annotated corpora, input data or hypothesis for their approach, and other factors. Strengths and drawbacks of individual projects are identified in the perspectives above. This survey concludes with our current research focus in this area and future prospects

    Data-Driven Synthesis and Evaluation of Syntactic Facial Expressions in American Sign Language Animation

    Full text link
    Technology to automatically synthesize linguistically accurate and natural-looking animations of American Sign Language (ASL) would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf and have low English literacy skills. State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on the facial expressions. We are investigating the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of sentences. In this thesis, we propose to: (1) explore the methodological aspects of evaluating sign language animations with facial expressions, and (2) examine data-driven modeling of facial expressions from multiple recordings of ASL signers. In Part I of this thesis, we propose to conduct rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations with facial expressions. Our research questions involve: (i) stimuli design, (ii) effect of videos as upper baseline and for presenting comprehension questions, and (iii) eye-tracking as an alternative to recording question-responses from participants. In Part II of this thesis, we propose to use generative models to automatically uncover the underlying trace of ASL syntactic facial expressions from multiple recordings of ASL signers, and apply these facial expressions to manual signs in novel animated sentences. We hypothesize that an annotated sign language corpus, including both the manual and non-manual signs, can be used to model and generate linguistically meaningful facial expressions, if it is combined with facial feature extraction techniques, statistical machine learning, and an animation platform with detailed facial parameterization. To further improve sign language animation technology, we will assess the quality of the animation generated by our approach with ASL signers through the rigorous evaluation methodologies described in Part I

    Adversarial Training for Multi-Channel Sign Language Production

    Full text link
    Sign Languages are rich multi-channel languages, requiring articulation of both manual (hands) and non-manual (face and body) features in a precise, intricate manner. Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody this full sign morphology to be truly understandable by the Deaf community. Previous work has mainly focused on manual feature production, with an under-articulated output caused by regression to the mean. In this paper, we propose an Adversarial Multi-Channel approach to SLP. We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator. Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output. Additionally, we fully encapsulate sign articulators with the inclusion of non-manual features, producing facial features and mouthing patterns. We evaluate on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, and report state-of-the art SLP back-translation performance for manual production. We set new benchmarks for the production of multi-channel sign to underpin future research into realistic SLP

    The Role of Emotional and Facial Expression in Synthesised Sign Language Avatars

    Get PDF
    This thesis explores the role that underlying emotional facial expressions might have in regards to understandability in sign language avatars. Focusing specifically on Irish Sign Language (ISL), we examine the Deaf community’s requirement for a visual-gestural language as well as some linguistic attributes of ISL which we consider fundamental to this research. Unlike spoken language, visual-gestural languages such as ISL have no standard written representation. Given this, we compare current methods of written representation for signed languages as we consider: which, if any, is the most suitable transcription method for the medical receptionist dialogue corpus. A growing body of work is emerging from the field of sign language avatar synthesis. These works are now at a point where they can benefit greatly from introducing methods currently used in the field of humanoid animation and, more specifically, the application of morphs to represent facial expression. The hypothesis underpinning this research is: augmenting an existing avatar (eSIGN) with various combinations of the 7 widely accepted universal emotions identified by Ekman (1999) to deliver underlying facial expressions, will make that avatar more human-like. This research accepts as true that this is a factor in improving usability and understandability for ISL users. Using human evaluation methods (Huenerfauth, et al., 2008) the research compares an augmented set of avatar utterances against a baseline set with regards to 2 key areas: comprehension and naturalness of facial configuration. We outline our approach to the evaluation including our choice of ISL participants, interview environment, and evaluation methodology. Remarkably, the results of this manual evaluation show that there was very little difference between the comprehension scores of the baseline avatars and those augmented withEFEs. However, after comparing the comprehension results for the synthetic human avatar “Anna” against the caricature type avatar “Luna”, the synthetic human avatar Anna was the clear winner. The qualitative feedback allowed us an insight into why comprehension scores were not higher in each avatar and we feel that this feedback will be invaluable to the research community in the future development of sign language avatars. Other questions asked in the evaluation focused on sign language avatar technology in a more general manner. Significantly, participant feedback in regard to these questions indicates a rise in the level of literacy amongst Deaf adults as a result of mobile technology

    Modeling the Speed and Timing of American Sign Language to Generate Realistic Animations

    Get PDF
    While there are many Deaf or Hard of Hearing (DHH) individuals with excellent reading literacy, there are also some DHH individuals who have lower English literacy. American Sign Language (ASL) is not simply a method of representing English sentences. It is possible for an individual to be fluent in ASL, while having limited fluency in English. To overcome this barrier, we aim to make it easier to generate ASL animations for websites, through the use of motion-capture data recorded from human signers to build different predictive models for ASL animations; our goal is to automate this aspect of animation synthesis to create realistic animations. This dissertation consists of several parts: Part I, defines key terminology for timing and speed parameters, and surveys literature on prior linguistic and computational research on ASL. Next, the motion-capture data that our lab recorded from human signers is discussed, and details are provided about how we enhanced this corpus to make it useful for speed and timing research. Finally, we present the process of adding layers of linguistic annotation and processing this data for speed and timing research. Part II presents our research on data-driven predictive models for various speed and timing parameters of ASL animations. The focus is on predicting the (1) existence of pauses after each ASL sign, (2) predicting the time duration of these pauses, and (3) predicting the change of speed for each ASL sign within a sentence. We measure the quality of the proposed models by comparing our models with state-of-the-art rule-based models. Furthermore, using these models, we synthesized ASL animation stimuli and conducted a user-based evaluation with DHH individuals to measure the usability of the resulting animation. Finally, Part III presents research on whether the timing parameters individuals prefer for animation may differ from those in recordings of human signers. Furthermore, it also includes research to investigate the distribution of acceleration curves in recordings of human signers and whether utilizing a similar set of curves in ASL animations leads to measurable improvements in DHH users\u27 perception of animation quality

    Using formal logic to represent sign language phonetics in semi-automatic annotation tasks

    Get PDF
    Cette thĂšse prĂ©sente le dĂ©veloppement d'un framework formel pour la reprĂ©sentation des Langues de Signes (LS), les langages des communautĂ©s Sourdes, dans le cadre de la construction d'un systĂšme de reconnaissance automatique. Les LS sont de langues naturelles, qui utilisent des gestes et l'espace autour du signeur pour transmettre de l'information. Cela veut dire que, Ă  diffĂ©rence des langues vocales, les morphĂšmes en LS ne correspondent pas aux sĂ©quences de sons; ils correspondent aux sĂ©quences de postures corporelles trĂšs spĂ©cifiques, sĂ©parĂ©s par des changements tels que de mouvements. De plus, lors du discours les signeurs utilisent plusieurs parties de leurs corps (articulateurs) simultanĂ©ment, ce qui est difficile Ă  capturer avec un systĂšme de notation Ă©crite. Cette situation difficultĂ© leur reprĂ©sentation dans de taches de Traitement Automatique du Langage Naturel (TALN). Pour ces raisons, le travail prĂ©sentĂ© dans ce document a comme objectif la construction d'une reprĂ©sentation abstraite de la LS; plus prĂ©cisĂ©ment, le but est de pouvoir reprĂ©senter des collections de vidĂ©o LS (corpus) de maniĂšre formelle. En gĂ©nĂ©rale, il s'agit de construire une couche de reprĂ©sentation intermĂ©diaire, permettant de faire de la reconnaissance automatique indĂ©pendamment des technologies de suivi et des corpus utilisĂ©s pour la recherche. Cette couche corresponde Ă  un systĂšme de transition d'Ă©tats (STE), spĂ©cialement crĂ©e pour reprĂ©senter la nature parallĂšle des LS. En plus, elle peut-ĂȘtre annotĂ© avec de formules logiques pour son analyse, Ă  travers de la vĂ©rification de modĂšles. Pour reprĂ©senter les propriĂ©tĂ©s Ă  vĂ©rifier, une logique multi-modale a Ă©tĂ© choisi : la Logique Propositionnelle Dynamique (PDL). Cette logique a Ă©tĂ© originalement crĂ©e pour la spĂ©cification de programmes. De maniĂšre plus prĂ©cise, PDL permit d'utilise des opĂ©rateurs modales comme [a] et , reprĂ©sentant > et >, respectivement. Une variante particulaire a Ă©tĂ© dĂ©veloppĂ©e pour les LS : la PDL pour Langue de Signes (PDLSL), qui est interprĂ©tĂ© sur des STE reprĂ©sentant des corpus. Avec PDLSL, chaque articulateur du corps (comme les mains et la tĂȘte) est vu comme un agent indĂ©pendant; cela veut dire que chacun a ses propres actions et propositions possibles, et qu'il peux les exĂ©cuter pour influencer une posture gestuelle. L'utilisation du framework proposĂ© peut aider Ă  diminuer deux problĂšmes importantes qui existent dans l'Ă©tude linguistique des LS : hĂ©tĂ©rogĂ©nĂ©itĂ© des corpus et la manque des systĂšmes automatiques d'aide Ă  l'annotation. De ce fait, un chercheur peut rendre exploitables des corpus existants en les transformant vers des STE. Finalement, la crĂ©ation de cet outil Ă  permit l'implĂ©mentation d'un systĂšme d'annotation semi-automatique, basĂ© sur les principes thĂ©oriques du formalisme. Globalement, le systĂšme reçoit des vidĂ©os LS et les transforme dans un STE valide. Ensuite, un module fait de la vĂ©rification formelle sur le STE, en utilisant une base de donnĂ©es de formules crĂ©e par un expert en LS. Les formules reprĂ©sentent des propriĂ©tĂ©s lexicales Ă  chercher dans le STE. Le produit de ce processus, est une annotation qui peut ĂȘtre corrigĂ© par des utilisateurs humains, et qui est utilisable dans des domaines d'Ă©tudes tels que la linguistique.This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics

    Using formal logic to represent sign language phonetics in semi-automatic annotation tasks

    Get PDF
    This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics.Cette thĂšse prĂ©sente le dĂ©veloppement d'un framework formel pour la reprĂ©sentation des Langues de Signes (LS), les langages des communautĂ©s Sourdes, dans le cadre de la construction d'un systĂšme de reconnaissance automatique. Les LS sont de langues naturelles, qui utilisent des gestes et l'espace autour du signeur pour transmettre de l'information. Cela veut dire que, Ă  diffĂ©rence des langues vocales, les morphĂšmes en LS ne correspondent pas aux sĂ©quences de sons; ils correspondent aux sĂ©quences de postures corporelles trĂšs spĂ©cifiques, sĂ©parĂ©s par des changements tels que de mouvements. De plus, lors du discours les signeurs utilisent plusieurs parties de leurs corps (articulateurs) simultanĂ©ment, ce qui est difficile Ă  capturer avec un systĂšme de notation Ă©crite. Cette situation difficultĂ© leur reprĂ©sentation dans de taches de Traitement Automatique du Langage Naturel (TALN). Pour ces raisons, le travail prĂ©sentĂ© dans ce document a comme objectif la construction d'une reprĂ©sentation abstraite de la LS; plus prĂ©cisĂ©ment, le but est de pouvoir reprĂ©senter des collections de vidĂ©o LS (corpus) de maniĂšre formelle. En gĂ©nĂ©rale, il s'agit de construire une couche de reprĂ©sentation intermĂ©diaire, permettant de faire de la reconnaissance automatique indĂ©pendamment des technologies de suivi et des corpus utilisĂ©s pour la recherche. Cette couche corresponde Ă  un systĂšme de transition d'Ă©tats (STE), spĂ©cialement crĂ©e pour reprĂ©senter la nature parallĂšle des LS. En plus, elle peut-ĂȘtre annotĂ© avec de formules logiques pour son analyse, Ă  travers de la vĂ©rification de modĂšles. Pour reprĂ©senter les propriĂ©tĂ©s Ă  vĂ©rifier, une logique multi-modale a Ă©tĂ© choisi : la Logique Propositionnelle Dynamique (PDL). Cette logique a Ă©tĂ© originalement crĂ©e pour la spĂ©cification de programmes. De maniĂšre plus prĂ©cise, PDL permit d'utilise des opĂ©rateurs modales comme [a] et , reprĂ©sentant > et >, respectivement. Une variante particulaire a Ă©tĂ© dĂ©veloppĂ©e pour les LS : la PDL pour Langue de Signes (PDLSL), qui est interprĂ©tĂ© sur des STE reprĂ©sentant des corpus. Avec PDLSL, chaque articulateur du corps (comme les mains et la tĂȘte) est vu comme un agent indĂ©pendant; cela veut dire que chacun a ses propres actions et propositions possibles, et qu'il peux les exĂ©cuter pour influencer une posture gestuelle. L'utilisation du framework proposĂ© peut aider Ă  diminuer deux problĂšmes importantes qui existent dans l'Ă©tude linguistique des LS : hĂ©tĂ©rogĂ©nĂ©itĂ© des corpus et la manque des systĂšmes automatiques d'aide Ă  l'annotation. De ce fait, un chercheur peut rendre exploitables des corpus existants en les transformant vers des STE. Finalement, la crĂ©ation de cet outil Ă  permit l'implĂ©mentation d'un systĂšme d'annotation semi-automatique, basĂ© sur les principes thĂ©oriques du formalisme. Globalement, le systĂšme reçoit des vidĂ©os LS et les transforme dans un STE valide. Ensuite, un module fait de la vĂ©rification formelle sur le STE, en utilisant une base de donnĂ©es de formules crĂ©e par un expert en LS. Les formules reprĂ©sentent des propriĂ©tĂ©s lexicales Ă  chercher dans le STE. Le produit de ce processus, est une annotation qui peut ĂȘtre corrigĂ© par des utilisateurs humains, et qui est utilisable dans des domaines d'Ă©tudes tels que la linguistique
    • 

    corecore