8,203 research outputs found

    Emotional avatars

    Get PDF

    FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion

    Full text link
    Speech-driven 3D facial animation synthesis has been a challenging task both in industry and research. Recent methods mostly focus on deterministic deep learning methods meaning that given a speech input, the output is always the same. However, in reality, the non-verbal facial cues that reside throughout the face are non-deterministic in nature. In addition, majority of the approaches focus on 3D vertex based datasets and methods that are compatible with existing facial animation pipelines with rigged characters is scarce. To eliminate these issues, we present FaceDiffuser, a non-deterministic deep learning model to generate speech-driven facial animations that is trained with both 3D vertex and blendshape based datasets. Our method is based on the diffusion technique and uses the pre-trained large speech representation model HuBERT to encode the audio input. To the best of our knowledge, we are the first to employ the diffusion method for the task of speech-driven 3D facial animation synthesis. We have run extensive objective and subjective analyses and show that our approach achieves better or comparable results in comparison to the state-of-the-art methods. We also introduce a new in-house dataset that is based on a blendshape based rigged character. We recommend watching the accompanying supplementary video. The code and the dataset will be publicly available.Comment: Pre-print of the paper accepted at ACM SIGGRAPH MIG 202

    Facial Expression Rendering in Medical Training Simulators: Current Status and Future Directions

    Get PDF
    Recent technological advances in robotic sensing and actuation methods have prompted development of a range of new medical training simulators with multiple feedback modalities. Learning to interpret facial expressions of a patient during medical examinations or procedures has been one of the key focus areas in medical training. This paper reviews facial expression rendering systems in medical training simulators that have been reported to date. Facial expression rendering approaches in other domains are also summarized to incorporate the knowledge from those works into developing systems for medical training simulators. Classifications and comparisons of medical training simulators with facial expression rendering are presented, and important design features, merits and limitations are outlined. Medical educators, students and developers are identified as the three key stakeholders involved with these systems and their considerations and needs are presented. Physical-virtual (hybrid) approaches provide multimodal feedback, present accurate facial expression rendering, and can simulate patients of different age, gender and ethnicity group; makes it more versatile than virtual and physical systems. The overall findings of this review and proposed future directions are beneficial to researchers interested in initiating or developing such facial expression rendering systems in medical training simulators.This work was supported by the Robopatient project funded by the EPSRC Grant No EP/T00519X/

    Sign Language Translation Approach to Sinhalese Language

    Get PDF
    Sign language is used for communication between deafpersons while Sinhalese language is used by normal hearingpersons whose first language is Sinhalese in Sri Lanka. Thisresearch focuses on an approach for a real-time translation fromSri Lankan sign language to Sinhalese language which willbridge the communication gap between deaf and ordinarycommunities. This study further focuses on a novel methodologyof enabling distance communication between deaf and ordinarypersons. Once the sign based gestures captured by depth sensingcamera, series of feature extraction techniques will be used toidentify essential attributes in gesture frame. Identified featureframe will be compared with pre-trained gesture dictionarybased on classification techniques, in order to identify gesturebased word. Detected word will be displayed for ordinary user orcould be used for communication between two individuals in twodifferent geographic locations. Proposed prototype has providedan overall recognition rate of 94.2% for a dictionary of fifteensigns in Sri Lankan sign language

    Generating realistic, animated human gestures in order to model, analyse and recognize Irish Sign Language

    Get PDF
    The aim of this thesis is to generate a gesture recognition system which can recognize several signs of Irish Sign Language (ISL). This project is divided into three parts. The first part provides background information on ISL. An overview of the ISL structure is a prerequisite to identifying and understanding the difficulties encountered in the development of a recognition system. The second part involves the generation of a data repository: synthetic and real-time video. Initially the synthetic data is created in a 3D animation package in order to simplify the creation of motion variations of the animated signer. The animation environment in our implementation allows for the generation of different versions of the same gesture with slight variations in the parameters of the motion. Secondly a database of ISL real-time video was created. This database contains 1400 different signs, including motion variation in each gesture. The third part details step by step my novel classification system and the associated prototype recognition system. The classification system is constructed as a decision tree to identify each sign uniquely. The recognition system is based on only one component of the classification system and has been implemented as a Hidden Markov Model (HMM)

    Measuring, analysing and artificially generating head nodding signals in dyadic social interaction

    Get PDF
    Social interaction involves rich and complex behaviours where verbal and non-verbal signals are exchanged in dynamic patterns. The aim of this thesis is to explore new ways of measuring and analysing interpersonal coordination as it naturally occurs in social interactions. Specifically, we want to understand what different types of head nods mean in different social contexts, how they are used during face-to-face dyadic conversation, and if they relate to memory and learning. Many current methods are limited by time-consuming and low-resolution data, which cannot capture the full richness of a dyadic social interaction. This thesis explores ways to demonstrate how high-resolution data in this area can give new insights into the study of social interaction. Furthermore, we also want to demonstrate the benefit of using virtual reality to artificially generate interpersonal coordination to test our hypotheses about the meaning of head nodding as a communicative signal. The first study aims to capture two patterns of head nodding signals – fast nods and slow nods – and determine what they mean and how they are used across different conversational contexts. We find that fast nodding signals receiving new information and has a different meaning than slow nods. The second study aims to investigate a link between memory and head nodding behaviour. This exploratory study provided initial hints that there might be a relationship, though further analyses were less clear. In the third study, we aim to test if interactive head nodding in virtual agents can be used to measure how much we like the virtual agent, and whether we learn better from virtual agents that we like. We find no causal link between memory performance and interactivity. In the fourth study, we perform a cross-experimental analysis of how the level of interactivity in different contexts (i.e., real, virtual, and video), impacts on memory and find clear differences between them

    Spectators’ aesthetic experiences of sound and movement in dance performance

    Get PDF
    In this paper we present a study of spectators’ aesthetic experiences of sound and movement in live dance performance. A multidisciplinary team comprising a choreographer, neuroscientists and qualitative researchers investigated the effects of different sound scores on dance spectators. What would be the impact of auditory stimulation on kinesthetic experience and/or aesthetic appreciation of the dance? What would be the effect of removing music altogether, so that spectators watched dance while hearing only the performers’ breathing and footfalls? We investigated audience experience through qualitative research, using post-performance focus groups, while a separately conducted functional brain imaging (fMRI) study measured the synchrony in brain activity across spectators when they watched dance with sound or breathing only. When audiences watched dance accompanied by music the fMRI data revealed evidence of greater intersubject synchronisation in a brain region consistent with complex auditory processing. The audience research found that some spectators derived pleasure from finding convergences between two complex stimuli (dance and music). The removal of music and the resulting audibility of the performers’ breathing had a significant impact on spectators’ aesthetic experience. The fMRI analysis showed increased synchronisation among observers, suggesting greater influence of the body when interpreting the dance stimuli. The audience research found evidence of similar corporeally focused experience. The paper discusses possible connections between the findings of our different approaches, and considers the implications of this study for interdisciplinary research collaborations between arts and sciences

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing
    • 

    corecore