6 research outputs found

    Realistic Face Animation From Sparse Stereo Meshes

    Get PDF
    URL : http://spitswww.uvt.nl/Fsw/Psychologie/AVSP2007/papers/bergerAVSP.pdfInternational audienceBeing able to produce realistic facial animation is crucial for many speech applications in language learning technologies. For reaching realism, it is necessary to acquire and to animate dense 3D models of the face. Recovering dense models is often achieved using stereovision techniques. Unfortunately, reconstruction artifacts are common and are mainly due to the difficulty to match points on untextured areas of the face between images. In this paper, we propose a robust and fully automatic method to produce realistic dense animation. Our input data are a dense 3D mesh of the talker obtained for one viseme as well as a corpus of stereo sequences of a talker painted with markers that allows the face kinematics to be learned. The main contribution of the paper is to transfer the kinematics learned on a sparse mesh onto the 3D dense mesh, thus allowing dense facial animation. Examples of face animations are provided which prove the reliability of the proposed method

    Setup for Acoustic-Visual Speech Synthesis by Concatenating Bimodal Units

    Get PDF
    International audienceThis paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to au- diovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of the approach, since both the synthesized speech signal and the face animation are of good quality. Planned improvements and enhancements to the system are outlined

    Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes

    Get PDF
    The original publication is available at www.springerlink.comInternational audienceBeing able to produce realistic facial animation is crucial for many speech applications in language learning technologies. Reaching realism needs to acquire and to animate dense 3D models of the face which are often acquired with 3D scanners. However, acquiring the dy- namics of the speech from 3D scans is difficult as the acquisition time generally allows only sustained sounds to be recorded. On the contrary, acquiring the speech dynamics on a sparse set of points is easy using a stereovision recording a talker with markers painted on his/her face. In this paper, we propose an approach to animate a very realistic dense talking head which makes use of a reduced set of 3D dense meshes ac- quired for sustained sounds as well as the speech dynamics learned on a talker painted with white markers. The contributions of the paper are twofold: We first propose an appropriate principal component anal- ysis (PCA) with missing data techniques in order to compute the basic modes of the speech dynamics despite possible unobservable points in the sparse meshes obtained by the stereovision system. We then propose a method for densifying the modes, that is a method for computing the dense modes for spatial animation from the sparse modes learned by the stereovision system. Examples prove the effectiveness of the approach and the high realism obtained with our method

    Towards a True Acoustic-Visual Speech Synthesis

    Get PDF
    International audienceThis paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality
    corecore