Search CORE

6 research outputs found

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Author: Liu Yu
Liu Ziwei
Luo Ping
Wang Xiaogang
Zhou Hang
Publication venue
Publication date: 23/04/2019
Field of study

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.Comment: AAAI Conference on Artificial Intelligence (AAAI 2019) Oral Presentation. Code, models, and video results are available on our webpage: https://liuziwei7.github.io/projects/TalkingFace.htm

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Synchronizing Keyframe Facial Animation to Multiple Text-to-Speech Engines and Natural Voice with Fast Response Time

Author: Pechter William H
Publication venue: Dartmouth Digital Commons
Publication date: 01/05/2004
Field of study

This thesis aims to create an automated lip-synchronization system for real-time applications. Specifically, the system is required to be fast, consist of a limited number of keyframes with small memory requirements, and create fluid and believable animations that synchronize with text-to-speech engines as well as raw voice data. The algorithms utilize traditional keyframe animation and a novel method of keyframe selection. Additionally, phoneme-to-keyframe mapping, synchronization, and simple blending rules are employed. The algorithms provide blending between keyframe images, borrow information from neighboring phonemes, accentuate phonemes b, p and m, differentiate between keyframes for phonemes with allophonic variations, and provide prosodromic variation by including emotion while speaking. The lip-sync animation synchronizes with multiple synthesized voices and human speech. A fast and versatile online real-time java chat interface is created to exhibit vivid facial animation. Results show that the animation algorithms are fast and show accurate lip-synchronization. Additionally, surveys showed that the animations are visually pleasing and improve speech understandability 96% of the time. Applications for this project include internet chat capabilities, interactive teaching of foreign languages, animated news broadcasting, enhanced game technology, and cell phone messaging

Dartmouth Digital Commons (Dartmouth College)

Дигиталната сценография и персонаж в киното : Автореферат на дисертационен труд за присъждане на образователна и научна степен "доктор" по научна специалност "Кинознание, киноизкуство и телевизия"

Author: Якимов Петко
Publication venue
Publication date: 01/01/2016
Field of study

New Bulgarian University Scholar Electronic Repository

Methods for evaluating driver-road interactions, a pilot study

Author: Spivack Mayer David
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1964
Field of study

Thesis (M.C.P.)--Massachusetts Institute of Technology, Dept. of City & Regional Planning, 1964.Includes bibliographical references (leaves 191-192).by Mayer David Spivack.M.C.P

DSpace@MIT