7,186 research outputs found

    LaughTalk: Expressive 3D Talking Head Generation with Laughter

    Full text link
    Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter. Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters and vertices. Given our proposed dataset, we present a strong baseline with a two-stage training scheme: the model first learns to talk and then acquires the ability to express laughter. Extensive experiments demonstrate that our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals. We further explore potential applications on top of our proposed method for rigging realistic avatars.Comment: Accepted to WACV202

    An examination of inner experience: Anxiety

    Full text link
    Descriptive Experience Sampling (DES) is used to examine the inner experience of seven individuals who have been diagnosed with at least one anxiety disorder and four control individuals. Idiographic results for each of the 11 participants are provided, including a description of frequent and rare/unique experiences of each participant. These results are followed by between participant nomothetic comparisons. Among the results, it was found that anxious participants experienced more indefinite figure-ground and concrete experiences when compared to controls. Anxious participants are also more likely than controls to engage in negative valence self-evaluations and rate moments as being anxious. There is also some evidence to support the notion that, overall, anxious and depressive symptoms decrease over the course of sampling regardless of group affiliation; Following the results, implications of findings from this study and future recommendations are discussed

    Genre collisions, culture collisions : identifying and understanding different types of cross-cultural influence in music

    Get PDF
    “Genre Collisions, Culture Collisions” explores cross-cultural composition, cultural appropriation, and post-colonialism in music, through theoretical research, creative practice and musical analysis. I critically assess how various types of cross-cultural borrowing can affect notions of cultural influence and appropriation. The specific focus of the creative work and musical analysis is the fusion of Anglo- American pop music with both traditional and popular music from Africa, particularly from South Africa and West Africa. The result of the research is one hour of recorded music presented as an album, Flight Cycle, accompanied by the thesis. The primary field of research is cross-cultural composition. Contained within that field are the sub-fields of post-racial identity in music and cultural appropriation. The fields of post-colonialism, critical race theory and ethnomusicology are also integral to the study. Some key reference points have been the work of Kofi Agawu and Austin Emielu in the field of ethnomusicology; Edward Said, Robert Young and Ghassan Hage in the field of postcolonialism; and Jim Chapman and Susan Fast in the field of cultural appropriation. The study responds to the following core questions: How do power relations between artists and cultures inform notions of cultural appropriation in music? What are the distinctions and where are the boundaries between cultural appropriation and ethically sound forms of cross-cultural exchange and influence? Do composition and production techniques change the nature and ethics of crosscultural borrowing? If so, in what way? The theoretical studies have led to a greater understanding of my creative processes and a greater awareness of the need for ethical reflection when approaching the music of non-Anglo-American cultures. In turn, my practice, in terms of the exploration of elements found in African music and their fusion with musical elements from my own background in Western pop and rock styles, has helped me to better understand the concepts of cultural influence and appropriation, and labels attached to musical styles relating to ethnicity and culture. The study concludes that not all crosscultural borrowing or influence should be considered cultural appropriation. Factors including intra-ethnic influence and crossovers between class and race mean that a nuanced approach is needed to gain an understanding of the ethics cross-cultural composition

    Generation of realistic human behaviour

    Get PDF
    As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

    Learning Speech-driven 3D Conversational Gestures from Video

    Get PDF
    We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as dense 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations

    Descriptive Experience Sampling of individuals with symptoms of obsessive-compulsive disorder

    Full text link
    This study employed the Descriptive Experience Sampling method to investigate the inner experiences of three individuals with symptoms of obsessive-compulsive disorder (OCD) and one non-OCD-symptom participant. Participants were provided with a random-interval generator (beeper) and were asked to freeze the aspects of their inner experience at the moment of the beep and record this experience in a notebook. Participants met with the investigators within 24 hours to discuss each of these sampled moments in detail. Salient characteristics were identified for each participant. Characteristics of inner experience were found to be shared across subjects: OCD-symptom participants were found to have a higher frequency of unsymbolized thinking and feelings, and a lower frequency of inner speech than normal participants. Additionally, participants were often able to localize the characteristics of their inner experience in some specific location in their heads. Results of sampling did not find any frequently recurring thoughts, impulses, or images

    Individual differences in face-looking behavior generalize from the lab to the world

    Get PDF
    Recent laboratory studies have found large, stable individual differences in the location people first fixate when identifying faces, ranging from the brows to the mouth. Importantly, this variation is strongly associated with differences in fixation-specific identification performance such that individuals' recognition ability is maximized when looking at their preferred location (Mehoudar, Arizpe, Baker, & Yovel, 2014; Peterson & Eckstein, 2013). This finding suggests that face representations are retinotopic and individuals enact gaze strategies that optimize identification, yet the extent to which this behavior reflects real-world gaze behavior is unknown. Here, we used mobile eye trackers to test whether individual differences in face gaze generalize from lab to real-world vision. In-lab fixations were measured with a speeded face identification task, while real-world behavior was measured as subjects freely walked around the Massachusetts Institute of Technology campus. We found a strong correlation between the patterns of individual differences in face gaze in the lab and real-world settings. Our findings support the hypothesis that individuals optimize realworld face identification by consistently fixating the same location and thus strongly constraining the space of retinotopic input. The methods developed for this study entailed collecting a large set of high-definition, wide field-of-view natural videos from head-mounted cameras and the viewer's fixation position, allowing us to characterize subjects' actually experienced real-world retinotopic images. These images enable us to ask how vision is optimized not just for the statistics of the ''natural images'' found in web databases, but of the truly natural, retinotopic images that have landed on actual human retinae during real-world experience
    corecore