705,512 research outputs found

    On combining the facial movements of a talking head

    Get PDF
    We present work on Obie, an embodied conversational agent framework. An embodied conversational agent, or talking head, consists of three main components. The graphical part consists of a face model and a facial muscle model. Besides the graphical part, we have implemented an emotion model and a mapping from emotions to facial expressions. The animation part of the framework focuses on the combination of different facial movements temporally. In this paper we propose a scheme of combining facial movements on a 3D talking head

    Book review: Mob Rule Learning

    Get PDF
    Mob Rule Learning: Camps, Unconferences and Trashing the Talking Head. By Michelle Boule, Medford: Cyber Age Books, 2011, paperback ISBN 978-0-910965-92-7, 230 pages

    Text-based Editing of Talking-head Video

    No full text
    Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

    OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions

    Full text link
    One-shot talking head generation has no explicit head movement reference, thus it is difficult to generate talking heads with head motions. Some existing works only edit the mouth area and generate still talking heads, leading to unreal talking head performance. Other works construct one-to-one mapping between audio signal and head motion sequences, introducing ambiguity correspondences into the mapping since people can behave differently in head motions when speaking the same content. This unreasonable mapping form fails to model the diversity and produces either nearly static or even exaggerated head motions, which are unnatural and strange. Therefore, the one-shot talking head generation task is actually a one-to-many ill-posed problem and people present diverse head motions when speaking. Based on the above observation, we propose OSM-Net, a \textit{one-to-many} one-shot talking head generation network with natural head motions. OSM-Net constructs a motion space that contains rich and various clip-level head motion features. Each basis of the space represents a feature of meaningful head motion in a clip rather than just a frame, thus providing more coherent and natural motion changes in talking heads. The driving audio is mapped into the motion space, around which various motion features can be sampled within a reasonable range to achieve the one-to-many mapping. Besides, the landmark constraint and time window feature input improve the accurate expression feature extraction and video generation. Extensive experiments show that OSM-Net generates more natural realistic head motions under reasonable one-to-many mapping paradigm compared with other methods.Comment: Paper Under Revie

    FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions

    Full text link
    One-shot talking head generation has received growing attention in recent years, with various creative and practical applications. An ideal natural and vivid generated talking head video should contain natural head pose changes. However, it is challenging to map head pose sequences from driving audio since there exists a natural gap between audio-visual modalities. In this work, we propose a Flow-guided One-shot model that achieves NaTural head motions(FONT) over generated talking heads. Specifically, the head pose prediction module is designed to generate head pose sequences from the source face and driving audio. We add the random sampling operation and the structural similarity constraint to model the diversity in the one-to-many mapping between audio-visual modality, thus predicting natural head poses. Then we develop a keypoint predictor that produces unsupervised keypoints from the source face, driving audio and pose sequences to describe the facial structure information. Finally, a flow-guided occlusion-aware generator is employed to produce photo-realistic talking head videos from the estimated keypoints and source face. Extensive experimental results prove that FONT generates talking heads with natural head poses and synchronized mouth shapes, outperforming other compared methods.Comment: Accepted by ICME202

    Kesan tahap realistik karakter animasi talking-head ke atas emosi dan prestasi pelajar : satu kajian awal

    Get PDF
    Animasi talking-head merupakan animasi arahan yang mampu membantu pembelajaran kemahiran sebutan sesuatu perkataan secara betul dan tepat. Namun, kesilapan dalam penggunaan karakter animasi memberi kesan negatif kepada pelajar. Kajian ini memfokus kepada isu Uncanny Valley yang dapat memberi kesan kepada emosi pelajar akibat daripada karakter animasi yang hampir menyerupai manusia. Justeru, kajian ini menilai penggunaan animasi talking-head yang berbeza tahap realistik terhadap pembelajaran sebutan perkataan di Kolej Komuniti. Penilaian keberkesanan animasi ini diukur melalui ujian sebutan dan ujian emosi mengguna soal selidik AEQ. Empat perisian animasi talking-head dengan tahap realistik berbeza dibangun untuk diuji dan setiap perisian tersebut dipelajari secara kendiri oleh sekumpulan pelajar yang terdiri daripada 20 orang. Jumlah keseluruhan sampel ialah 80 orang terdiri daripada pelajar di empat buah kolej komuniti di Perak. Ujian statistik deskriptif seperti nilai min, sisihan piawai dan peratus diguna bagi menjawab persoalan kajian. Dapatan kajian menunjukkan perisian animasi talking-headtiga dimensi tidak realistik (3D-TR) memperoleh peratusan tertinggi dari sudut emosi dan prestasi sebutan pelajar manakala perisian animasi talking-head tiga dimensi realistik (3D-R) memperoleh peratusan terendah dari kedua-dua aspek tersebut. Justeru, penggunaan karakter animasi tiga dimensi talking-head yang tidak realistik merupakan tahap realistik yang terbaik untuk membentuk emosi yang positif seterusnya berpotensi meningkat prestasi pelajar

    Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

    Full text link
    Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications. However, the inflexibility and inefficiency of existing methods, which necessitate expensive end-to-end training to transfer emotions from guidance videos to talking-head predictions, are significant limitations. In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner through parameter-efficient adaptations. Our approach utilizes a pretrained emotion-agnostic talking-head transformer and introduces three lightweight adaptations (the Deep Emotional Prompts, Emotional Deformation Network, and Emotional Adaptation Module) from different perspectives to enable precise and realistic emotion controls. Our experiments demonstrate that our approach achieves state-of-the-art performance on widely-used benchmarks, including LRW and MEAD. Additionally, our parameter-efficient adaptations exhibit remarkable generalization ability, even in scenarios where emotional training videos are scarce or nonexistent. Project website: https://yuangan.github.io/eat/Comment: Accepted to ICCV 2023. Project page: https://yuangan.github.io/eat
    corecore