Skip to main content
Article thumbnail
Location of Repository

Trainable Videorealistic Speech Animation

By Tony Ezzat, Gadi Geiger and Tomaso Poggio


We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two ke

Topics: CR Categories, I.3.7 [Computer Graphics, Three Dimensional Graphics and Realism---Animation, I.2.10 [Artificial Intelligence, Vision and Scene Understanding---Video Analysis I.2.10 [Artificial Intelligence, Vision and Scene Understanding---Motion Keywords, facial modeling, facial animation, morphing, optical flow, speech synthesis, lip synchronization. 1 Overview
Publisher: ACM Press
Year: 2002
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.