2,086 research outputs found

    Improvements on a simple muscle-based 3D face for realistic facial expressions

    Get PDF
    Facial expressions play an important role in face-to-face communication. With the development of personal computers capable of rendering high quality graphics, computer facial animation has produced more and more realistic facial expressions to enrich human-computer communication. In this paper, we present a simple muscle-based 3D face model that can produce realistic facial expressions in real time. We extend Waters' (1987) muscle model to generate bulges and wrinkles and to improve the combination of multiple muscle actions. In addition, we present techniques to reduce the computation burden on the muscle mode

    Artimate: an articulatory animation framework for audiovisual speech synthesis

    Get PDF
    We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012

    A biomechanical model of the face including muscles for the prediction of deformations during speech production

    Full text link
    A 3D biomechanical finite element model of the face is presented. Muscles are represented by piece-wise uniaxial tension cable elements linking the insertion points. Such insertion points are specific entities differing from nodes of the finite element mesh, which makes possible to change either the mesh or the muscle implementation totally independently of each other. Lip/teeth and upper lip/lower lip contacts are also modeled. Simulations of smiling and of an Orbicularis Oris activation are presented and interpreted. The importance of a proper account of contacts and of an accurate anatomical description is show

    Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

    Get PDF
    We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

    Using multimedia interfaces for speech therapy

    Get PDF

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Full text link
    Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

    On combining the facial movements of a talking head

    Get PDF
    We present work on Obie, an embodied conversational agent framework. An embodied conversational agent, or talking head, consists of three main components. The graphical part consists of a face model and a facial muscle model. Besides the graphical part, we have implemented an emotion model and a mapping from emotions to facial expressions. The animation part of the framework focuses on the combination of different facial movements temporally. In this paper we propose a scheme of combining facial movements on a 3D talking head
    • …
    corecore