Search CORE

248 research outputs found

Data-driven Synthesis of Animations of Spatially Inflected American Sign Language Verbs Using Human Data

Author: Lu Pengfei
Publication venue: CUNY Academic Works
Publication date: 01/02/2014
Field of study

Techniques for producing realistic and understandable animations of American Sign Language (ASL) have accessibility benefits for signers with lower levels of written language literacy. Previous research in sign language animation didn’t address the specific linguistic issue of space use and verb inflection, due to a lack of sufficiently detailed and linguistically annotated ASL corpora, which is necessary for modern data-driven approaches. In this dissertation, a high-quality ASL motion capture corpus with ASL-specific linguistic structures is collected, annotated, and evaluated using carefully designed protocols and well-calibrated motion capture equipment. In addition, ASL animations are modeled, synthesized, and evaluated based on samples of ASL signs collected from native-signer animators or from signers recorded using motion capture equipment. Part I of this dissertation focuses on how an ASL corpus is collected, including unscripted ASL passages and ASL inflecting verbs, signs in which the location and orientation of the hands is influenced by the arrangement of locations in 3D space that represent entities under discussion. Native signers are recorded in a studio with motion capture equipment: cyber-gloves, body suit, head tracker, hand tracker, and eye tracker. Part II describes how ASL animation is synthesized using our corpus of ASL inflecting verbs. Specifically, mathematical models of hand movement are trained on animation data of signs produced by a native signer. This dissertation work demonstrates that mathematical models can be trained and built using movement data collected from humans. The evaluation studies with deaf native signer participants show that the verb animations synthesized from our models have similar understandability in subjective-rating and comprehension-question scores to animations produced by a human animator, or to animations driven by a human’s motion capture data. The modeling techniques in this dissertation are applicable to other types of ASL signs and to other sign languages used internationally. These models’ parameterization of sign animations can increase the repertoire of generation systems and can automate the work of humans using sign language scripting systems

City University of New York

Interactive Editing in French Sign Language Dedicated to Virtual Signers: Requirements and Challenges

Author: Brun Rémi
Gibet Sylvie
Hamon Ludovic
Lefebvre-Albaret François
Turki Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/06/2015
Field of study

International audienceSigning avatars are increasingly used as an interface for communication to the deaf community. In recent years, an emerging approach uses captured data to edit and generate sign language (SL) gestures. Thanks to motion editing operations (e.g., concatenation, mixing), this method offers the possibility to compose new utterances, thus facilitating the enrichment of the original corpus, enhancing the natural look of the animation, and promoting the avatar’s acceptability. However, designing such an editing system raises many questions. In particular, manipulating existing movements does not guarantee the semantic consistency of the reconstructed actions. A solution is to insert the human operator in a loop for constructing new utterances and to incorporate within the utterance’s structure constraints that are derived from linguistic patterns. This article discusses the main requirements for the whole pipeline design of interactive virtual signers, including: (1) the creation of corpora, (2) the needed resources for motion recording, (3) the annotation process as the heart of the SL editing process, (4) the building, indexing, and querying of a motion database, (5) the virtual avatar animation by editing and composing motion segments, and (6) the conception of a dedicated user interface according to user’ knowledge and abilities. Each step is illustrated by the authors’ recent work and results from the project Sign3D, i.e., an editing system of French Sign Language (LSF) content

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

TR-2015001: A Survey and Critique of Facial Expression Synthesis in Sign Language Animation

Author: Kacorri Hernisa
Publication venue: CUNY Academic Works
Publication date: 01/01/2015
Field of study

Sign language animations can lead to better accessibility of information and services for people who are deaf and have low literacy skills in spoken/written languages. Due to the distinct word-order, syntax, and lexicon of the sign language from the spoken/written language, many deaf people find it difficult to comprehend the text on a computer screen or captions on a television. Animated characters performing sign language in a comprehensible way could make this information accessible. Facial expressions and other non-manual components play an important role in the naturalness and understandability of these animations. Their coordination to the manual signs is crucial for the interpretation of the signed message. Software to advance the support of facial expressions in generation of sign language animation could make this technology more acceptable for deaf people. In this survey, we discuss the challenges in facial expression synthesis and we compare and critique the state of the art projects on generating facial expressions in sign language animations. Beginning with an overview of facial expressions linguistics, sign language animation technologies, and some background on animating facial expressions, a discussion of the search strategy and criteria used to select the five projects that are the primary focus of this survey follows. This survey continues on to introduce the work from the five projects under consideration. Their contributions are compared in terms of support for specific sign language, categories of facial expressions investigated, focus range in the animation generation, use of annotated corpora, input data or hypothesis for their approach, and other factors. Strengths and drawbacks of individual projects are identified in the perspectives above. This survey concludes with our current research focus in this area and future prospects

City University of New York

Data-Driven Synthesis and Evaluation of Syntactic Facial Expressions in American Sign Language Animation

Author: Kacorri Hernisa
Publication venue: CUNY Academic Works
Publication date: 03/06/2016
Field of study

Technology to automatically synthesize linguistically accurate and natural-looking animations of American Sign Language (ASL) would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf and have low English literacy skills. State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on the facial expressions. We are investigating the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of sentences. In this thesis, we propose to: (1) explore the methodological aspects of evaluating sign language animations with facial expressions, and (2) examine data-driven modeling of facial expressions from multiple recordings of ASL signers. In Part I of this thesis, we propose to conduct rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations with facial expressions. Our research questions involve: (i) stimuli design, (ii) effect of videos as upper baseline and for presenting comprehension questions, and (iii) eye-tracking as an alternative to recording question-responses from participants. In Part II of this thesis, we propose to use generative models to automatically uncover the underlying trace of ASL syntactic facial expressions from multiple recordings of ASL signers, and apply these facial expressions to manual signs in novel animated sentences. We hypothesize that an annotated sign language corpus, including both the manual and non-manual signs, can be used to model and generate linguistically meaningful facial expressions, if it is combined with facial feature extraction techniques, statistical machine learning, and an animation platform with detailed facial parameterization. To further improve sign language animation technology, we will assess the quality of the animation generated by our approach with ASL signers through the rigorous evaluation methodologies described in Part I

City University of New York

Adversarial Training for Multi-Channel Sign Language Production

Author: Bowden Richard
Camgoz Necati Cihan
Saunders Ben
Publication venue
Publication date: 05/08/2020
Field of study

Sign Languages are rich multi-channel languages, requiring articulation of both manual (hands) and non-manual (face and body) features in a precise, intricate manner. Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody this full sign morphology to be truly understandable by the Deaf community. Previous work has mainly focused on manual feature production, with an under-articulated output caused by regression to the mean. In this paper, we propose an Adversarial Multi-Channel approach to SLP. We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator. Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output. Additionally, we fully encapsulate sign articulators with the inclusion of non-manual features, producing facial features and mouthing patterns. We evaluate on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, and report state-of-the art SLP back-translation performance for manual production. We set new benchmarks for the production of multi-channel sign to underpin future research into realistic SLP

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

The Role of Emotional and Facial Expression in Synthesised Sign Language Avatars

Author: Smith Robert G
Publication venue: Dublin Institute of Technology
Publication date: 01/09/2014
Field of study

This thesis explores the role that underlying emotional facial expressions might have in regards to understandability in sign language avatars. Focusing specifically on Irish Sign Language (ISL), we examine the Deaf community’s requirement for a visual-gestural language as well as some linguistic attributes of ISL which we consider fundamental to this research. Unlike spoken language, visual-gestural languages such as ISL have no standard written representation. Given this, we compare current methods of written representation for signed languages as we consider: which, if any, is the most suitable transcription method for the medical receptionist dialogue corpus. A growing body of work is emerging from the field of sign language avatar synthesis. These works are now at a point where they can benefit greatly from introducing methods currently used in the field of humanoid animation and, more specifically, the application of morphs to represent facial expression. The hypothesis underpinning this research is: augmenting an existing avatar (eSIGN) with various combinations of the 7 widely accepted universal emotions identified by Ekman (1999) to deliver underlying facial expressions, will make that avatar more human-like. This research accepts as true that this is a factor in improving usability and understandability for ISL users. Using human evaluation methods (Huenerfauth, et al., 2008) the research compares an augmented set of avatar utterances against a baseline set with regards to 2 key areas: comprehension and naturalness of facial configuration. We outline our approach to the evaluation including our choice of ISL participants, interview environment, and evaluation methodology. Remarkably, the results of this manual evaluation show that there was very little difference between the comprehension scores of the baseline avatars and those augmented withEFEs. However, after comparing the comprehension results for the synthetic human avatar “Anna” against the caricature type avatar “Luna”, the synthetic human avatar Anna was the clear winner. The qualitative feedback allowed us an insight into why comprehension scores were not higher in each avatar and we feel that this feedback will be invaluable to the research community in the future development of sign language avatars. Other questions asked in the evaluation focused on sign language avatar technology in a more general manner. Significantly, participant feedback in regard to these questions indicates a rise in the level of literacy amongst Deaf adults as a result of mobile technology

Arrow@TUDublin

Modeling the Speed and Timing of American Sign Language to Generate Realistic Animations

Author: Al-khazraji Sedeeq
Publication venue: RIT Scholar Works
Publication date: 01/12/2021
Field of study

While there are many Deaf or Hard of Hearing (DHH) individuals with excellent reading literacy, there are also some DHH individuals who have lower English literacy. American Sign Language (ASL) is not simply a method of representing English sentences. It is possible for an individual to be fluent in ASL, while having limited fluency in English. To overcome this barrier, we aim to make it easier to generate ASL animations for websites, through the use of motion-capture data recorded from human signers to build different predictive models for ASL animations; our goal is to automate this aspect of animation synthesis to create realistic animations. This dissertation consists of several parts: Part I, defines key terminology for timing and speed parameters, and surveys literature on prior linguistic and computational research on ASL. Next, the motion-capture data that our lab recorded from human signers is discussed, and details are provided about how we enhanced this corpus to make it useful for speed and timing research. Finally, we present the process of adding layers of linguistic annotation and processing this data for speed and timing research. Part II presents our research on data-driven predictive models for various speed and timing parameters of ASL animations. The focus is on predicting the (1) existence of pauses after each ASL sign, (2) predicting the time duration of these pauses, and (3) predicting the change of speed for each ASL sign within a sentence. We measure the quality of the proposed models by comparing our models with state-of-the-art rule-based models. Furthermore, using these models, we synthesized ASL animation stimuli and conducted a user-based evaluation with DHH individuals to measure the usability of the resulting animation. Finally, Part III presents research on whether the timing parameters individuals prefer for animation may differ from those in recordings of human signers. Furthermore, it also includes research to investigate the distribution of acceleration curves in recordings of human signers and whether utilizing a similar set of curves in ASL animations leads to measurable improvements in DHH users\u27 perception of animation quality

RIT Scholar Works

Recommended from our members

Painting Pictures with Words - From Theory to System

Author: Coyne Robert Eric
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

A picture paints a thousand words, or so we are told. But how many words does it take to paint a picture? And how can words create pictures in the first place? In this thesis we examine a new theory of linguistic meaning -- where the meaning of words and sentences is determined by the scenes they evoke. We describe how descriptive text is parsed and semantically interpreted and how the semantic interpretation is then depicted as a rendered 3D scene. In doing so, we describe WordsEye, our text-to-scene system, and touch upon many fascinating issues of lexical semantics, knowledge representation, and what we call "graphical semantics." We introduce the notion of vignettes as a way to bridge between function and form, between the semantics of language and the grounded semantics of 3D scenes. And we describe how VigNet, our lexical semantic and graphical knowledge base, mediates the whole process. In the second part of this thesis, we describe four different ways WordsEye has been tested. We first discuss an evaluation of the system in an educational environment where WordsEye was shown to significantly improve literacy skills for sixth grade students versus a control group. We then compare WordsEye with Google Image Search on "realistic" and "imaginative" sentences in order to evaluate its performance on a sentence-by-sentence level and test its potential as a way to augment existing image search tools. Thirdly, we describe what we have learned in testing WordsEye as an online 3D authoring system where it has attracted 20,000 real-world users who have performed almost one million scene depictions. Finally, we describe tests of WordsEye as an elicitation tool for field linguists studying endangered languages. We then sum up by presenting a roadmap for enhancing the capabilities of the system and identifying key opportunities and issues to be addressed

Columbia University Academic Commons

Using formal logic to represent sign language phonetics in semi-automatic annotation tasks

Author: Curiel Diaz Arturo Tlacaélel
Publication venue
Publication date: 23/11/2015
Field of study

Cette thèse présente le développement d'un framework formel pour la représentation des Langues de Signes (LS), les langages des communautés Sourdes, dans le cadre de la construction d'un système de reconnaissance automatique. Les LS sont de langues naturelles, qui utilisent des gestes et l'espace autour du signeur pour transmettre de l'information. Cela veut dire que, à différence des langues vocales, les morphèmes en LS ne correspondent pas aux séquences de sons; ils correspondent aux séquences de postures corporelles très spécifiques, séparés par des changements tels que de mouvements. De plus, lors du discours les signeurs utilisent plusieurs parties de leurs corps (articulateurs) simultanément, ce qui est difficile à capturer avec un système de notation écrite. Cette situation difficulté leur représentation dans de taches de Traitement Automatique du Langage Naturel (TALN). Pour ces raisons, le travail présenté dans ce document a comme objectif la construction d'une représentation abstraite de la LS; plus précisément, le but est de pouvoir représenter des collections de vidéo LS (corpus) de manière formelle. En générale, il s'agit de construire une couche de représentation intermédiaire, permettant de faire de la reconnaissance automatique indépendamment des technologies de suivi et des corpus utilisés pour la recherche. Cette couche corresponde à un système de transition d'états (STE), spécialement crée pour représenter la nature parallèle des LS. En plus, elle peut-être annoté avec de formules logiques pour son analyse, à travers de la vérification de modèles. Pour représenter les propriétés à vérifier, une logique multi-modale a été choisi : la Logique Propositionnelle Dynamique (PDL). Cette logique a été originalement crée pour la spécification de programmes. De manière plus précise, PDL permit d'utilise des opérateurs modales comme [a] et , représentant > et >, respectivement. Une variante particulaire a été développée pour les LS : la PDL pour Langue de Signes (PDLSL), qui est interprété sur des STE représentant des corpus. Avec PDLSL, chaque articulateur du corps (comme les mains et la tête) est vu comme un agent indépendant; cela veut dire que chacun a ses propres actions et propositions possibles, et qu'il peux les exécuter pour influencer une posture gestuelle. L'utilisation du framework proposé peut aider à diminuer deux problèmes importantes qui existent dans l'étude linguistique des LS : hétérogénéité des corpus et la manque des systèmes automatiques d'aide à l'annotation. De ce fait, un chercheur peut rendre exploitables des corpus existants en les transformant vers des STE. Finalement, la création de cet outil à permit l'implémentation d'un système d'annotation semi-automatique, basé sur les principes théoriques du formalisme. Globalement, le système reçoit des vidéos LS et les transforme dans un STE valide. Ensuite, un module fait de la vérification formelle sur le STE, en utilisant une base de données de formules crée par un expert en LS. Les formules représentent des propriétés lexicales à chercher dans le STE. Le produit de ce processus, est une annotation qui peut être corrigé par des utilisateurs humains, et qui est utilisable dans des domaines d'études tels que la linguistique.This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Using formal logic to represent sign language phonetics in semi-automatic annotation tasks

Author: Curiel Diaz Arturo Tlacaélel
Publication venue: HAL CCSD
Publication date: 23/11/2015
Field of study

This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics.Cette thèse présente le développement d'un framework formel pour la représentation des Langues de Signes (LS), les langages des communautés Sourdes, dans le cadre de la construction d'un système de reconnaissance automatique. Les LS sont de langues naturelles, qui utilisent des gestes et l'espace autour du signeur pour transmettre de l'information. Cela veut dire que, à différence des langues vocales, les morphèmes en LS ne correspondent pas aux séquences de sons; ils correspondent aux séquences de postures corporelles très spécifiques, séparés par des changements tels que de mouvements. De plus, lors du discours les signeurs utilisent plusieurs parties de leurs corps (articulateurs) simultanément, ce qui est difficile à capturer avec un système de notation écrite. Cette situation difficulté leur représentation dans de taches de Traitement Automatique du Langage Naturel (TALN). Pour ces raisons, le travail présenté dans ce document a comme objectif la construction d'une représentation abstraite de la LS; plus précisément, le but est de pouvoir représenter des collections de vidéo LS (corpus) de manière formelle. En générale, il s'agit de construire une couche de représentation intermédiaire, permettant de faire de la reconnaissance automatique indépendamment des technologies de suivi et des corpus utilisés pour la recherche. Cette couche corresponde à un système de transition d'états (STE), spécialement crée pour représenter la nature parallèle des LS. En plus, elle peut-être annoté avec de formules logiques pour son analyse, à travers de la vérification de modèles. Pour représenter les propriétés à vérifier, une logique multi-modale a été choisi : la Logique Propositionnelle Dynamique (PDL). Cette logique a été originalement crée pour la spécification de programmes. De manière plus précise, PDL permit d'utilise des opérateurs modales comme [a] et , représentant > et >, respectivement. Une variante particulaire a été développée pour les LS : la PDL pour Langue de Signes (PDLSL), qui est interprété sur des STE représentant des corpus. Avec PDLSL, chaque articulateur du corps (comme les mains et la tête) est vu comme un agent indépendant; cela veut dire que chacun a ses propres actions et propositions possibles, et qu'il peux les exécuter pour influencer une posture gestuelle. L'utilisation du framework proposé peut aider à diminuer deux problèmes importantes qui existent dans l'étude linguistique des LS : hétérogénéité des corpus et la manque des systèmes automatiques d'aide à l'annotation. De ce fait, un chercheur peut rendre exploitables des corpus existants en les transformant vers des STE. Finalement, la création de cet outil à permit l'implémentation d'un système d'annotation semi-automatique, basé sur les principes théoriques du formalisme. Globalement, le système reçoit des vidéos LS et les transforme dans un STE valide. Ensuite, un module fait de la vérification formelle sur le STE, en utilisant une base de données de formules crée par un expert en LS. Les formules représentent des propriétés lexicales à chercher dans le STE. Le produit de ce processus, est une annotation qui peut être corrigé par des utilisateurs humains, et qui est utilisable dans des domaines d'études tels que la linguistique

Thèses en Ligne

Theses.fr

Thèses en ligne de l'Université Toulouse III - Paul Sabatier