136 research outputs found

    Eyetracking Metrics Related to Subjective Assessments of ASL Animations

    Get PDF
    Analysis of eyetracking data can serve as an alternative method of evaluation when assessing the quality of computer-synthesized animations of American Sign Language (ASL), technology which can make information accessible to people who are deaf or hard-of-hearing, who may have lower levels of written language literacy. In this work, we build and evaluate the efficacy of descriptive models of subjective scores that native signers assign to ASL animations, based on eye-tracking metrics

    Best practices for conducting evaluations of sign language animation

    Get PDF
    Automatic synthesis of linguistically accurate and natural-looking American Sign Language (ASL) animations would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf. Based on several years of studies, we identify best practices for conducting experimental evaluations of sign language animations with feedback from deaf and hard-of-hearing users. First, we describe our techniques for identifying and screening participants, and for controlling the experimental environment. Finally, we discuss rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations. Our discussion focuses on stimuli design, effect of using videos as an upper baseline, using videos for presenting comprehension questions, and eye-tracking as an alternative to recording question-responses

    TR-2015001: A Survey and Critique of Facial Expression Synthesis in Sign Language Animation

    Full text link
    Sign language animations can lead to better accessibility of information and services for people who are deaf and have low literacy skills in spoken/written languages. Due to the distinct word-order, syntax, and lexicon of the sign language from the spoken/written language, many deaf people find it difficult to comprehend the text on a computer screen or captions on a television. Animated characters performing sign language in a comprehensible way could make this information accessible. Facial expressions and other non-manual components play an important role in the naturalness and understandability of these animations. Their coordination to the manual signs is crucial for the interpretation of the signed message. Software to advance the support of facial expressions in generation of sign language animation could make this technology more acceptable for deaf people. In this survey, we discuss the challenges in facial expression synthesis and we compare and critique the state of the art projects on generating facial expressions in sign language animations. Beginning with an overview of facial expressions linguistics, sign language animation technologies, and some background on animating facial expressions, a discussion of the search strategy and criteria used to select the five projects that are the primary focus of this survey follows. This survey continues on to introduce the work from the five projects under consideration. Their contributions are compared in terms of support for specific sign language, categories of facial expressions investigated, focus range in the animation generation, use of annotated corpora, input data or hypothesis for their approach, and other factors. Strengths and drawbacks of individual projects are identified in the perspectives above. This survey concludes with our current research focus in this area and future prospects

    Modeling the Speed and Timing of American Sign Language to Generate Realistic Animations

    Get PDF
    While there are many Deaf or Hard of Hearing (DHH) individuals with excellent reading literacy, there are also some DHH individuals who have lower English literacy. American Sign Language (ASL) is not simply a method of representing English sentences. It is possible for an individual to be fluent in ASL, while having limited fluency in English. To overcome this barrier, we aim to make it easier to generate ASL animations for websites, through the use of motion-capture data recorded from human signers to build different predictive models for ASL animations; our goal is to automate this aspect of animation synthesis to create realistic animations. This dissertation consists of several parts: Part I, defines key terminology for timing and speed parameters, and surveys literature on prior linguistic and computational research on ASL. Next, the motion-capture data that our lab recorded from human signers is discussed, and details are provided about how we enhanced this corpus to make it useful for speed and timing research. Finally, we present the process of adding layers of linguistic annotation and processing this data for speed and timing research. Part II presents our research on data-driven predictive models for various speed and timing parameters of ASL animations. The focus is on predicting the (1) existence of pauses after each ASL sign, (2) predicting the time duration of these pauses, and (3) predicting the change of speed for each ASL sign within a sentence. We measure the quality of the proposed models by comparing our models with state-of-the-art rule-based models. Furthermore, using these models, we synthesized ASL animation stimuli and conducted a user-based evaluation with DHH individuals to measure the usability of the resulting animation. Finally, Part III presents research on whether the timing parameters individuals prefer for animation may differ from those in recordings of human signers. Furthermore, it also includes research to investigate the distribution of acceleration curves in recordings of human signers and whether utilizing a similar set of curves in ASL animations leads to measurable improvements in DHH users\u27 perception of animation quality

    Data-Driven Synthesis and Evaluation of Syntactic Facial Expressions in American Sign Language Animation

    Full text link
    Technology to automatically synthesize linguistically accurate and natural-looking animations of American Sign Language (ASL) would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf and have low English literacy skills. State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on the facial expressions. We are investigating the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of sentences. In this thesis, we propose to: (1) explore the methodological aspects of evaluating sign language animations with facial expressions, and (2) examine data-driven modeling of facial expressions from multiple recordings of ASL signers. In Part I of this thesis, we propose to conduct rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations with facial expressions. Our research questions involve: (i) stimuli design, (ii) effect of videos as upper baseline and for presenting comprehension questions, and (iii) eye-tracking as an alternative to recording question-responses from participants. In Part II of this thesis, we propose to use generative models to automatically uncover the underlying trace of ASL syntactic facial expressions from multiple recordings of ASL signers, and apply these facial expressions to manual signs in novel animated sentences. We hypothesize that an annotated sign language corpus, including both the manual and non-manual signs, can be used to model and generate linguistically meaningful facial expressions, if it is combined with facial feature extraction techniques, statistical machine learning, and an animation platform with detailed facial parameterization. To further improve sign language animation technology, we will assess the quality of the animation generated by our approach with ASL signers through the rigorous evaluation methodologies described in Part I

    Data-driven Synthesis of Animations of Spatially Inflected American Sign Language Verbs Using Human Data

    Full text link
    Techniques for producing realistic and understandable animations of American Sign Language (ASL) have accessibility benefits for signers with lower levels of written language literacy. Previous research in sign language animation didn’t address the specific linguistic issue of space use and verb inflection, due to a lack of sufficiently detailed and linguistically annotated ASL corpora, which is necessary for modern data-driven approaches. In this dissertation, a high-quality ASL motion capture corpus with ASL-specific linguistic structures is collected, annotated, and evaluated using carefully designed protocols and well-calibrated motion capture equipment. In addition, ASL animations are modeled, synthesized, and evaluated based on samples of ASL signs collected from native-signer animators or from signers recorded using motion capture equipment. Part I of this dissertation focuses on how an ASL corpus is collected, including unscripted ASL passages and ASL inflecting verbs, signs in which the location and orientation of the hands is influenced by the arrangement of locations in 3D space that represent entities under discussion. Native signers are recorded in a studio with motion capture equipment: cyber-gloves, body suit, head tracker, hand tracker, and eye tracker. Part II describes how ASL animation is synthesized using our corpus of ASL inflecting verbs. Specifically, mathematical models of hand movement are trained on animation data of signs produced by a native signer. This dissertation work demonstrates that mathematical models can be trained and built using movement data collected from humans. The evaluation studies with deaf native signer participants show that the verb animations synthesized from our models have similar understandability in subjective-rating and comprehension-question scores to animations produced by a human animator, or to animations driven by a human’s motion capture data. The modeling techniques in this dissertation are applicable to other types of ASL signs and to other sign languages used internationally. These models’ parameterization of sign animations can increase the repertoire of generation systems and can automate the work of humans using sign language scripting systems

    Windows into Sensory Integration and Rates in Language Processing: Insights from Signed and Spoken Languages

    Get PDF
    This dissertation explores the hypothesis that language processing proceeds in "windows" that correspond to representational units, where sensory signals are integrated according to time-scales that correspond to the rate of the input. To investigate universal mechanisms, a comparison of signed and spoken languages is necessary. Underlying the seemingly effortless process of language comprehension is the perceiver's knowledge about the rate at which linguistic form and meaning unfold in time and the ability to adapt to variations in the input. The vast body of work in this area has focused on speech perception, where the goal is to determine how linguistic information is recovered from acoustic signals. Testing some of these theories in the visual processing of American Sign Language (ASL) provides a unique opportunity to better understand how sign languages are processed and which aspects of speech perception models are in fact about language perception across modalities. The first part of the dissertation presents three psychophysical experiments investigating temporal integration windows in sign language perception by testing the intelligibility of locally time-reversed sentences. The findings demonstrate the contribution of modality for the time-scales of these windows, where signing is successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 ms), while also pointing to modality-independent mechanisms, where integration occurs in durations that correspond to the size of linguistic units. The second part of the dissertation focuses on production rates in sentences taken from natural conversations of English, Korean, and ASL. Data from word, sign, morpheme, and syllable rates suggest that while the rate of words and signs can vary from language to language, the relationship between the rate of syllables and morphemes is relatively consistent among these typologically diverse languages. The results from rates in ASL also complement the findings in perception experiments by confirming that time-scales at which phonological units fluctuate in production match the temporal integration windows in perception. These results are consistent with the hypothesis that there are modality-independent time pressures for language processing, and discussions provide a synthesis of converging findings from other domains of research and propose ideas for future investigations

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Teoria del Senyal i Comunicacion

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Postprint (published version
    • …
    corecore