Predicting Head Pose From Speech

Greenwood, David

thesis

Predicting Head Pose From Speech

Authors: David Greenwood
Publication date: 1 January 2018
Publisher

Abstract

Speech animation, the process of animating a human-like model to give the impression it is talking, most commonly relies on the work of skilled animators, or performance capture. These approaches are time consuming, expensive, and lack the ability to scale. This thesis develops algorithms for content driven speech animation; models that learn visual actions from data without semantic labelling, to predict realistic speech animation from recorded audio. We achieve these goals by _rst forming a multi-modal corpus that represents the style of speech we want to model; speech that is natural, expressive and prosodic. This allows us to train deep recurrent neural networks to predict compelling animation. We _rst develop methods to predict the rigid head pose of a speaker. Predicting the head pose of a speaker from speech is not wholly deterministic, so our methods provide a large variety of plausible head pose trajectories from a single utterance. We then apply our methods to learn how to predict the head pose of the listener while in conversation, using only the voice of the speaker. Finally, we show how to predict the lip sync, facial expression, and rigid head pose of the speaker, simultaneously, solely from speec

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

University of East Anglia digital repository

oai:ueaeprints.uea.ac.uk:69976

Last time updated on 06/05/2019

University of East Anglia digital repository

oai:ueaeprints.uea.ac.uk:72595

Last time updated on 19/10/2019