Animation of a hierarchical image based facial model and perceptual analysis of visual speech

Abstract

In this Thesis a hierarchical image-based 2D talking head model is presented, together with robust automatic and semi-automatic animation techniques, and a novel perceptual method for evaluating visual-speech based on the McGurk effect. The novelty of the hierarchical facial model stems from the fact that sub-facial areas are modelled individually. To produce a facial animation, animations for a set of chosen facial areas are first produced, either by key-framing sub-facial parameter values, or using a continuous input speech signal, and then combined into a full facial output. Modelling hierarchically has several attractive qualities. It isolates variation in sub-facial regions from the rest of the face, and therefore provides a high degree of control over different facial parts along with meaningful image based animation parameters. The automatic synthesis of animations may be achieved using speech not originally included in the training set. The model is also able to automatically animate pauses, hesitations and non-verbal (or non-speech related) sounds and actions. To automatically produce visual-speech, two novel analysis and synthesis methods are proposed. The first method utilises a Speech-Appearance Model (SAM), and the second uses a Hidden Markov Coarticulation Model (HMCM) - based on a Hidden Markov Model (HMM). To evaluate synthesised animations (irrespective of whether they are rendered semi automatically, or using speech), a new perceptual analysis approach based on the McGurk effect is proposed. This measure provides both an unbiased and quantitative method for evaluating talking head visual speech quality and overall perceptual realism. A combination of this new approach, along with other objective and perceptual evaluation techniques, are employed for a thorough evaluation of hierarchical model animations

    Similar works