705,512 research outputs found
On combining the facial movements of a talking head
We present work on Obie, an embodied conversational
agent framework. An embodied conversational agent, or
talking head, consists of three main components. The
graphical part consists of a face model and a facial muscle
model. Besides the graphical part, we have implemented
an emotion model and a mapping from emotions to facial
expressions. The animation part of the framework focuses
on the combination of different facial movements
temporally. In this paper we propose a scheme of
combining facial movements on a 3D talking head
Book review: Mob Rule Learning
Mob Rule Learning: Camps, Unconferences and Trashing the Talking Head. By Michelle Boule, Medford: Cyber Age Books, 2011, paperback ISBN 978-0-910965-92-7, 230 pages
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions
One-shot talking head generation has no explicit head movement reference,
thus it is difficult to generate talking heads with head motions. Some existing
works only edit the mouth area and generate still talking heads, leading to
unreal talking head performance. Other works construct one-to-one mapping
between audio signal and head motion sequences, introducing ambiguity
correspondences into the mapping since people can behave differently in head
motions when speaking the same content. This unreasonable mapping form fails to
model the diversity and produces either nearly static or even exaggerated head
motions, which are unnatural and strange. Therefore, the one-shot talking head
generation task is actually a one-to-many ill-posed problem and people present
diverse head motions when speaking. Based on the above observation, we propose
OSM-Net, a \textit{one-to-many} one-shot talking head generation network with
natural head motions. OSM-Net constructs a motion space that contains rich and
various clip-level head motion features. Each basis of the space represents a
feature of meaningful head motion in a clip rather than just a frame, thus
providing more coherent and natural motion changes in talking heads. The
driving audio is mapped into the motion space, around which various motion
features can be sampled within a reasonable range to achieve the one-to-many
mapping. Besides, the landmark constraint and time window feature input improve
the accurate expression feature extraction and video generation. Extensive
experiments show that OSM-Net generates more natural realistic head motions
under reasonable one-to-many mapping paradigm compared with other methods.Comment: Paper Under Revie
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions
One-shot talking head generation has received growing attention in recent
years, with various creative and practical applications. An ideal natural and
vivid generated talking head video should contain natural head pose changes.
However, it is challenging to map head pose sequences from driving audio since
there exists a natural gap between audio-visual modalities. In this work, we
propose a Flow-guided One-shot model that achieves NaTural head motions(FONT)
over generated talking heads. Specifically, the head pose prediction module is
designed to generate head pose sequences from the source face and driving
audio. We add the random sampling operation and the structural similarity
constraint to model the diversity in the one-to-many mapping between
audio-visual modality, thus predicting natural head poses. Then we develop a
keypoint predictor that produces unsupervised keypoints from the source face,
driving audio and pose sequences to describe the facial structure information.
Finally, a flow-guided occlusion-aware generator is employed to produce
photo-realistic talking head videos from the estimated keypoints and source
face. Extensive experimental results prove that FONT generates talking heads
with natural head poses and synchronized mouth shapes, outperforming other
compared methods.Comment: Accepted by ICME202
Kesan tahap realistik karakter animasi talking-head ke atas emosi dan prestasi pelajar : satu kajian awal
Animasi talking-head merupakan animasi arahan yang mampu membantu pembelajaran kemahiran sebutan sesuatu
perkataan secara betul dan tepat. Namun, kesilapan dalam penggunaan karakter animasi memberi kesan negatif
kepada pelajar. Kajian ini memfokus kepada isu Uncanny Valley yang dapat memberi kesan kepada emosi pelajar
akibat daripada karakter animasi yang hampir menyerupai manusia. Justeru, kajian ini menilai penggunaan animasi
talking-head yang berbeza tahap realistik terhadap pembelajaran sebutan perkataan di Kolej Komuniti. Penilaian
keberkesanan animasi ini diukur melalui ujian sebutan dan ujian emosi mengguna soal selidik AEQ. Empat perisian
animasi talking-head dengan tahap realistik berbeza dibangun untuk diuji dan setiap perisian tersebut dipelajari secara
kendiri oleh sekumpulan pelajar yang terdiri daripada 20 orang. Jumlah keseluruhan sampel ialah 80 orang terdiri
daripada pelajar di empat buah kolej komuniti di Perak. Ujian statistik deskriptif seperti nilai min, sisihan piawai dan
peratus diguna bagi menjawab persoalan kajian. Dapatan kajian menunjukkan perisian animasi talking-headtiga
dimensi tidak realistik (3D-TR) memperoleh peratusan tertinggi dari sudut emosi dan prestasi sebutan pelajar
manakala perisian animasi talking-head tiga dimensi realistik (3D-R) memperoleh peratusan terendah dari kedua-dua
aspek tersebut. Justeru, penggunaan karakter animasi tiga dimensi talking-head yang tidak realistik merupakan tahap
realistik yang terbaik untuk membentuk emosi yang positif seterusnya berpotensi meningkat prestasi pelajar
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Audio-driven talking-head synthesis is a popular research topic for virtual
human-related applications. However, the inflexibility and inefficiency of
existing methods, which necessitate expensive end-to-end training to transfer
emotions from guidance videos to talking-head predictions, are significant
limitations. In this work, we propose the Emotional Adaptation for Audio-driven
Talking-head (EAT) method, which transforms emotion-agnostic talking-head
models into emotion-controllable ones in a cost-effective and efficient manner
through parameter-efficient adaptations. Our approach utilizes a pretrained
emotion-agnostic talking-head transformer and introduces three lightweight
adaptations (the Deep Emotional Prompts, Emotional Deformation Network, and
Emotional Adaptation Module) from different perspectives to enable precise and
realistic emotion controls. Our experiments demonstrate that our approach
achieves state-of-the-art performance on widely-used benchmarks, including LRW
and MEAD. Additionally, our parameter-efficient adaptations exhibit remarkable
generalization ability, even in scenarios where emotional training videos are
scarce or nonexistent. Project website: https://yuangan.github.io/eat/Comment: Accepted to ICCV 2023. Project page: https://yuangan.github.io/eat
- …