1,711 research outputs found

    Eyetracking Metrics Related to Subjective Assessments of ASL Animations

    Get PDF
    Analysis of eyetracking data can serve as an alternative method of evaluation when assessing the quality of computer-synthesized animations of American Sign Language (ASL), technology which can make information accessible to people who are deaf or hard-of-hearing, who may have lower levels of written language literacy. In this work, we build and evaluate the efficacy of descriptive models of subjective scores that native signers assign to ASL animations, based on eye-tracking metrics

    Best practices for conducting evaluations of sign language animation

    Get PDF
    Automatic synthesis of linguistically accurate and natural-looking American Sign Language (ASL) animations would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf. Based on several years of studies, we identify best practices for conducting experimental evaluations of sign language animations with feedback from deaf and hard-of-hearing users. First, we describe our techniques for identifying and screening participants, and for controlling the experimental environment. Finally, we discuss rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations. Our discussion focuses on stimuli design, effect of using videos as an upper baseline, using videos for presenting comprehension questions, and eye-tracking as an alternative to recording question-responses

    TR-2015001: A Survey and Critique of Facial Expression Synthesis in Sign Language Animation

    Full text link
    Sign language animations can lead to better accessibility of information and services for people who are deaf and have low literacy skills in spoken/written languages. Due to the distinct word-order, syntax, and lexicon of the sign language from the spoken/written language, many deaf people find it difficult to comprehend the text on a computer screen or captions on a television. Animated characters performing sign language in a comprehensible way could make this information accessible. Facial expressions and other non-manual components play an important role in the naturalness and understandability of these animations. Their coordination to the manual signs is crucial for the interpretation of the signed message. Software to advance the support of facial expressions in generation of sign language animation could make this technology more acceptable for deaf people. In this survey, we discuss the challenges in facial expression synthesis and we compare and critique the state of the art projects on generating facial expressions in sign language animations. Beginning with an overview of facial expressions linguistics, sign language animation technologies, and some background on animating facial expressions, a discussion of the search strategy and criteria used to select the five projects that are the primary focus of this survey follows. This survey continues on to introduce the work from the five projects under consideration. Their contributions are compared in terms of support for specific sign language, categories of facial expressions investigated, focus range in the animation generation, use of annotated corpora, input data or hypothesis for their approach, and other factors. Strengths and drawbacks of individual projects are identified in the perspectives above. This survey concludes with our current research focus in this area and future prospects

    Data-Driven Synthesis and Evaluation of Syntactic Facial Expressions in American Sign Language Animation

    Full text link
    Technology to automatically synthesize linguistically accurate and natural-looking animations of American Sign Language (ASL) would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf and have low English literacy skills. State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on the facial expressions. We are investigating the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of sentences. In this thesis, we propose to: (1) explore the methodological aspects of evaluating sign language animations with facial expressions, and (2) examine data-driven modeling of facial expressions from multiple recordings of ASL signers. In Part I of this thesis, we propose to conduct rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations with facial expressions. Our research questions involve: (i) stimuli design, (ii) effect of videos as upper baseline and for presenting comprehension questions, and (iii) eye-tracking as an alternative to recording question-responses from participants. In Part II of this thesis, we propose to use generative models to automatically uncover the underlying trace of ASL syntactic facial expressions from multiple recordings of ASL signers, and apply these facial expressions to manual signs in novel animated sentences. We hypothesize that an annotated sign language corpus, including both the manual and non-manual signs, can be used to model and generate linguistically meaningful facial expressions, if it is combined with facial feature extraction techniques, statistical machine learning, and an animation platform with detailed facial parameterization. To further improve sign language animation technology, we will assess the quality of the animation generated by our approach with ASL signers through the rigorous evaluation methodologies described in Part I

    Training Effects of Adaptive Emotive Responses From Animated Agents in Simulated Environments

    Get PDF
    Humans are distinct from machines in their capacity to emote, stimulate, and express emotions. Because emotions play such an important role in human interactions, human-like agents used in pedagogical roles for simulation-based training should properly reflect emotions. Currently, research concerning the development of this type of agent focuses on basic agent interface characteristics, as well as character building qualities. However, human-like agents should provide emotion-like qualities that are clearly expressed, properly synchronized, and that simulate complex, real-time interactions through adaptive emotion systems. The research conducted for this dissertation was a quantitative investigation using 3 (within) x 2 (between) x 3 (within) factorial design. A total of 56 paid participants consented to complete the study. Independent variables included emotion intensity (i.e., low, moderate, and high emotion), levels of expertise (novice participant versus experienced participant), and number of trials. Dependent measures included visual attention, emotional response towards the animated agents, simulation performance score, and learners\u27 perception of the pedagogical agent persona while participants interacted with a pain assessment and management simulation. While no relationships were indicated between the levels of emotion intensity portrayed by the animated agents and the participants\u27 visual attention, emotional response towards the animated agent, and simulation performance score, there were significant relationships between the level of expertise of the participant and the visual attention, emotional responses, and performance outcomes. The results indicated that nursing students had higher visual attention during their interaction with the animated agents. Additionally, nursing students expressed more neutral facial expression whereas experienced nurses expressed more emotional facial expressions towards the animated agents. The results of the simulation performance scores indicated that nursing students obtained higher performance scores in the pain assessment and management task than experienced nurses. Both groups of participants had a positive perception of the animated agents persona

    A Representation of Selected Nonmanual Signals in American Sign Language

    Get PDF
    Computer-generated three-dimensional animation holds great promise for synthesizing utterances in American Sign Language (ASL) that are not only grammatical, but believable by members of the Deaf community. Animation poses several challenges stemming from the massive amounts of data necessary to specify the movement of three-dimensional geometry, and there is no current system that facilitates the synthesis of nonmanual signals. However, the linguistics of ASL can aid in surmounting the challenge by providing structure and rules for organizing the data. This work presents a first method for representing ASL linguistic and extralinguistic processes that involve the face. Any such representation must be capable of expressing the subtle nuances of ASL. Further, it must be able to represent co-occurrences because many ASL signs require that two or more nonmanual signals be used simultaneously. In fact simultaneity of multiple nonmanual signals can occur on the same facial feature. Additionally, such a system should allow both binary and incremental nonmanual signals to display the full range of adjectival and adverbial modifiers. Validating such a representation requires both the affirmation that nonmanual signals are indeed necessary in the animation of ASL, and the evaluation of the effectiveness of the new representation in synthesizing nonmanual signals. In this study, members of the Deaf community viewed animations created with the new representation and answered questions concerning the influence of selected nonmanual signals on the perceived meaning of the synthesized utterances. Results reveal that, not only is the representation capable of effectively portraying nonmanual signals, but also that it can be used to combine various nonmanual signals in the synthesis of complete ASL sentences. In a study with Deaf users, participants viewing synthesized animations consistently identified the intended nonmanual signals correctly

    Participant responses to virtual agents in immersive virtual environments.

    Get PDF
    This thesis is concerned with interaction between people and virtual humans in the context of highly immersive virtual environments (VEs). Empirical studies have shown that virtual humans (agents) with even minimal behavioural capabilities can have a significant emotional impact on participants of immersive virtual environments (IVEs) to the extent that these have been used in studies of mental health issues such as social phobia and paranoia. This thesis focuses on understanding the impact on the responses of people to the behaviour of virtual humans rather than their visual appearance. There are three main research questions addressed. First, the thesis considers what are the key nonverbal behavioural cues used to portray a specific psychological state. Second, research determines the extent to which the underlying state of a virtual human is recognisable through the display of a key set of cues inferred from the behaviour of real humans. Finally, the degree to which a perceived psychological state in a virtual human invokes responses from participants in immersive virtual environments that are similar to those observed in the physical world is considered. These research questions were investigated through four experiments. The first experiment focused on the impact of visual fidelity and behavioural complexity on participant responses by implementing a model of gaze behaviour in virtual humans. The results of the study concluded that participants expected more life-like behaviours from more visually realistic virtual humans. The second experiment investigated the detrimental effects on participant responses when interacting with virtual humans with low behavioural complexity. The third experiment investigated the differences in responses of participants to virtual humans perceived to be in varying emotional states. The emotional states of the virtual humans were portrayed using postural and facial cues. Results indicated that posture does play an important role in the portrayal of affect however the behavioural model used in the study did not fully cover the qualities of body movement associated with the emotions studied. The final experiment focused on the portrayal of affect through the quality of body movement such as the speed of gestures. The effectiveness of the virtual humans was gauged through exploring a variety of participant responses including subjective responses, objective physiological and behavioural measures. The results show that participants are affected and respond to virtual humans in a significant manner provided that an appropriate behavioural model is used

    Facial motion perception in autism spectrum disorder and neurotypical controls

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University LondonFacial motion provides an abundance of information necessary for mediating social communication. Emotional expressions, head rotations and eye-gaze patterns allow us to extract categorical and qualitative information from others (Blake & Shiffrar, 2007). Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterised by a severe impairment in social cognition. One of the causes may be related to a fundamental deficit in perceiving human movement (Herrington et al., (2007). This hypothesis was investigated more closely within the current thesis. In neurotypical controls, the visual processing of facial motion was analysed via EEG alpha waves. Participants were tested on their ability to discriminate between successive animations (exhibiting rigid and nonrigid motion). The appearance of the stimuli remained constant over trials, meaning decisions were based solely on differential movement patterns. The parieto-occipital region was specifically selective to upright facial motion while the occipital cortex responded similarly to natural and manipulated faces. Over both regions, a distinct pattern of activity in response to upright faces was characterised by a transient decrease and subsequent increase in neural processing (Girges et al., 2014). These results were further supported by an fMRI study which showed sensitivity of the superior temporal sulcus (STS) to perceived facial movements relative to inanimate and animate stimuli. The ability to process information from dynamic faces was assessed in ASD. Participants were asked to recognise different sequences, unfamiliar identities and genders from facial motion captures. Stimuli were presented upright and inverted in order to assess configural processing. Relative to the controls, participants with ASD were significantly impaired on all three tasks and failed to show an inversion effect (O'Brien et al., 2014). Functional neuroimaging revealed atypical activities in the visual cortex, STS and fronto-parietal regions thought to contain mirror neurons in participants with ASD. These results point to a deficit in the visual processing of facial motion, which in turn may partly cause social communicative impairments in ASD

    Development and evaluation of an interactive virtual audience for a public speaking training application

    Get PDF
    Einleitung: Eine der häufigsten sozialen Ängste ist die Angst vor öffentlichem Sprechen. Virtual-Reality- (VR-) Trainingsanwendungen sind ein vielversprechendes Instrument, um die Sprechangst zu reduzieren und die individuellen Sprachfähigkeiten zu verbessern. Grundvoraussetzung hierfür ist die Implementierung eines realistischen und interaktiven Sprecher-Publikum-Verhaltens. Ziel: Die Studie zielte darauf ab, ein realistisches und interaktives Publikum für eine VR-Anwendung zu entwickeln und zu bewerten, welches für die Trainingsanwendung von öffentlichem Sprechen angewendet wird. Zunächst wurde eine Beobachtungsstudie zu den Verhaltensmustern von Sprecher und Publikum durchgeführt. Anschließend wurden die identifizierten Muster in eine VR-Anwendung implementiert. Die Wahrnehmung der implementierten Interaktionsmuster wurde in einer weiteren Studie aus Sicht der Nutzer evaluiert. Beobachtungsstudie (1): Aufgrund der nicht ausreichenden Datengrundlage zum realen interaktiven Verhalten zwischen Sprecher und Publikum lautet die erste Forschungsfrage "Welche Sprecher-Publikums-Interaktionsmuster können im realen Umfeld identifiziert werden?". Es wurde eine strukturierte, nicht teilnehmende, offene Beobachtungsstudie durchgeführt. Ein reales Publikum wurde auf Video aufgezeichnet und die Inhalte analysiert. Die Stichprobe ergab N = 6484 beobachtete Interaktionsmuster. Es wurde festgestellt, dass Sprecher mehr Dialoge als das Publikum initiieren und wie die Zuschauer auf Gesichtsausdrücke und Gesten der Sprecher reagieren. Implementierungsstudie (2): Um effiziente Wege zur Implementierung der Ergebnisse der Beobachtungsstudie in die Trainingsanwendung zu finden, wurde die Forschungsfrage wie folgt formuliert: "Wie können Interaktionsmuster zwischen Sprecher und Publikum in eine virtuelle Anwendung implementiert werden?". Das Hardware-Setup bestand aus einer CAVE, Infitec-Brille und einem ART Head-Tracking. Die Software wurde mit 3D-Excite RTT DeltaGen 12.2 realisiert. Zur Beantwortung der zweiten Forschungsfrage wurden mehrere mögliche technische Lösungen systematisch untersucht, bis effiziente Lösungen gefunden wurden. Infolgedessen wurden die selbst erstellte Audioerkennung, die Kinect-Bewegungserkennung, die Affectiva-Gesichtserkennung und die selbst erstellten Fragen implementiert, um das interaktive Verhalten des Publikums in der Trainingsanwendung für öffentliches Sprechen zu realisieren. Evaluationsstudie (3): Um herauszufinden, ob die Implementierung interaktiver Verhaltensmuster den Erwartungen der Benutzer entsprach, wurde die dritte Forschungsfrage folgendermaßen formuliert: “Wie beeinflusst die Interaktivität einer virtuellen Anwendung für öffentliches Reden die Benutzererfahrung?”. Eine experimentelle Benutzer-Querschnittsstudie wurde mit N = 57 Teilnehmerinnen (65% Männer, 35% Frauen; Durchschnittsalter = 25.98, SD = 4.68) durchgeführt, die entweder der interaktiven oder nicht-interaktiven VR-Anwendung zugewiesen wurden. Die Ergebnisse zeigten, dass, es einen signifikanten Unterschied in der Wahrnehmung zwischen den beiden Anwendungen gab. Allgemeine Schlussfolgerungen: Interaktionsmuster zwischen Sprecher und Publikum, die im wirklichen Leben beobachtet werden können, wurden in eine VR-Anwendung integriert, die Menschen dabei hilft, Angst vor dem öffentlichen Sprechen zu überwinden und ihre öffentlichen Sprechfähigkeiten zu trainieren. Die Ergebnisse zeigten eine hohe Relevanz der VR-Anwendungen für die Simulation öffentlichen Sprechens. Obwohl die Fragen des Publikums manuell gesteuert wurden, konnte das neu gestaltete Publikum mit den Versuchspersonen interagieren. Die vorgestellte VR-Anwendung zeigt daher einen hohen potenziellen Nutzen, Menschen beim Trainieren von Sprechfähigkeiten zu unterstützen. Die Fragen des Publikums wurden immer noch manuell von einem Bediener reguliert und die Studie wurde mit Teilnehmern durchgeführt, die nicht unter einem hohen Grad an Angst vor öffentlichem Sprechen leiden. Bei zukünftigen Studien sollten fortschrittlichere Technologien eingesetzt werden, beispielsweise Spracherkennung, 3D-Aufzeichnungen oder 3D-Livestreams einer realen Person und auch Teilnehmer mit einem hohen Grad an Angst vor öffentlichen Ansprachen beziehungsweise Sprechen in der Öffentlichkeit.Introduction: Fear of public speaking is the most common social fear. Virtual reality (VR) training applications are a promising tool to improve public speaking skills. To be successful, applications should feature a high scenario fidelity. One way to improve it is to implement realistic speaker-audience interactive behavior. Objective: The study aimed to develop and evaluate a realistic and interactive audience for a VR public speaking training application. First, an observation study on real speaker-audience interactive behavior patterns was conducted. Second, identified patterns were implemented in the VR application. Finally, an evaluation study identified users’ perceptions of the training application. Observation Study (1): Because of the lack of data on real speaker-audience interactive behavior, the first research question to be answered was “What speaker-audience interaction patterns can be identified in real life?”. A structured, non-participant, overt observation study was conducted. A real audience was video recorded, and content analyzed. The sample resulted in N = 6,484 observed interaction patterns. It was found that speakers, more often than audience members, initiate dialogues and how audience members react to speakers’ facial expressions and gestures. Implementation Study (2): To find efficient ways of implementing the results of the observation study in the training application, the second research question was formulated as: “How can speaker-audience interaction patterns be implemented into the virtual public speaking application?”. The hardware setup comprised a CAVE, Infitec glasses, and ART head tracking. The software was realized with 3D-Excite RTT DeltaGen 12.2. To answer the second research question, several possible technical solutions were explored systematically, until efficient solutions were found. As a result, self-created audio recognition, Kinect motion recognition, Affectiva facial recognition, and manual question generation were implemented to provide interactive audience behavior in the public speaking training application. Evaluation Study (3): To find out if implementing interactive behavior patterns met users’ expectations, the third research question was formulated as “How does interactivity of a virtual public speaking application affect user experience?”. An experimental, cross-sectional user study was conducted with (N = 57) participants (65% men, 35% women; Mage = 25.98, SD = 4.68) who used either an interactive or a non-interactive VR application condition. Results revealed that there was a significant difference in users’ perception of the two conditions. General Conclusions: Speaker-audience interaction patterns that can be observed in real life were incorporated into a VR application that helps people to overcome the fear of public speaking and train their public speaking skills. The findings showed a high relevance of interactivity for VR public speaking applications. Although questions from the audience were still regulated manually, the newly designed audience could interact with the speakers. Thus, the presented VR application is of potential value in helping people to train their public speaking skills. The questions from the audience were still regulated manually by an operator and we conducted the study with participants not suffering from high degrees of public speaking fear. Future work may use more advanced technology, such as speech recognition, 3D-records, or live 3D-streams of an actual person and include participants with high degrees of public speaking fear

    The role of human movement kinematics in internal state inference

    Get PDF
    The kinematics of our movements reflect our internal (mental and affective) states. This thesis tests the hypothesis that these kinematic signals contribute to judgments about others’ internal states through models based on our own actions. Chapter 1 details the theoretical background and previous literature that motivates this hypothesis. Chapter 2 (typical adults) and 3 (typical adolescents) test the hypothesis that we use models of our own action kinematics to make judgments about others’ affective states. Both experiments support the hypothesis by demonstrating that differences in one’s own typical action kinematics determine the perceived intensity of affective states of observed point-light walkers. Chapters 4, 5, and 6 examine the hypothesis that atypical movement kinematics in autism spectrum disorder (autism) contribute to social communication difficulties. Chapters 4 and 5 measure two basic skills required to make internal state judgments from observing others’ actions: visual time perception and sensitivity to kinematic signals that describe ‘natural’ motion. Both studies find no deficits in the autism group compared to the typically developed group – and some enhanced abilities – suggesting that these basic skills are intact. However, Chapter 6 demonstrates that typically developed individuals are impaired at reading mental states from autistic actions, suggesting that atypical movement kinematics may be partly contributing to bi-directional communicative difficulties experienced between individuals with autism and their typical peers. Chapter 7 investigates whether differences in movement kinematics early in development are associated with later social skills in a group of infants at high- or low-risk of developing autism. Indeed, movement kinematics at 10 months of age predicts social abilities at 14 months of age, demonstrating the value of kinematic markers for predicting social functioning and possibly disorder. Chapter 8 summarises the studies presented in this thesis, which show support for the hypothesis that we judge others’ internal states through models based on our own actions
    corecore