441 research outputs found

    Realistic Lip Syncing for Virtual Character Using Common Viseme Set

    Get PDF
    Speech is one of the most important interaction methods between the humans. Therefore, most of avatar researches focus on this area with significant attention. Creating animated speech requires a facial model capable of representing the myriad shapes the human face expressions during speech. Moreover, a method to produce the correct shape at the correct time is also in order. One of the main challenges is to create precise lip movements of the avatar and synchronize it with a recorded audio. This paper proposes a new lip synchronization algorithm for realistic applications, which can be employed to generate synchronized facial movements among the audio generated from natural speech or through a text-to-speech engine. This method requires an animator to construct animations using a canonical set of visemes for all pair wise combination of a reduced phoneme set. These animations are then stitched together smoothly to construct the final animation

    Example Based Caricature Synthesis

    Get PDF
    The likeness of a caricature to the original face image is an essential and often overlooked part of caricature production. In this paper we present an example based caricature synthesis technique, consisting of shape exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial features. The relationship exaggeration step introduces two definitions which facilitate global facial feature synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance (MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a number of constraints. The effectiveness of our algorithm is demonstrated with experimental results

    CASA 2009:International Conference on Computer Animation and Social Agents

    Get PDF

    Interactive speech-driven facial animation

    Get PDF
    One of the fastest developing areas in the entertainment industry is digital animation. Television programmes and movies frequently use 3D animations to enhance or replace actors and scenery. With the increase in computing power, research is also being done to apply these animations in an interactive manner. Two of the biggest obstacles to the success of these undertakings are control (manipulating the models) and realism. This text describes many of the ways to improve control and realism aspects, in such a way that interactive animation becomes possible. Specifically, lip-synchronisation (driven by human speech), and various modeling and rendering techniques are discussed. A prototype that shows that interactive animation is feasible, is also described.Mr. A. Hardy Prof. S. von Solm

    Audio-to-Visual Speech Conversion using Deep Neural Networks

    Get PDF
    We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results

    Model-based synthesis of visual speech movements from 3D video

    Get PDF
    In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system, and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g. HMMs, neural nets) with unit selection we improve the quality of our speech synthesis

    Registration and statistical analysis of the tongue shape during speech production

    Get PDF
    This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

    Head Motion Analysis and Synthesis over Different Tasks

    Get PDF
    Abstract. It is known that subjects vary in their head movements. This paper presents an analysis of this variety over different tasks and speakers and their impact on head motion synthesis. Measured head and articulatory movements acquired by an ElectroMagnetic Articulograph (EMA) synchronously recorded with audio was used. Data set of speech of 12 people recorded on different tasks confirms that the head motion variate over tasks and speakers. Experimental results confirmed that the proposed models were capable of learning and synthesising task-dependent head motions from speech. Subjective evaluation of synthesised head motion using task models shows that trained models on the matched task is better than mismatched one and free speech data provide models that predict preferred motion by the participants compared to read speech data
    • …
    corecore