94 research outputs found

    Real-Time Cleaning and Refinement of Facial Animation Signals

    Full text link
    With the increasing demand for real-time animated 3D content in the entertainment industry and beyond, performance-based animation has garnered interest among both academic and industrial communities. While recent solutions for motion-capture animation have achieved impressive results, handmade post-processing is often needed, as the generated animations often contain artifacts. Existing real-time motion capture solutions have opted for standard signal processing methods to strengthen temporal coherence of the resulting animations and remove inaccuracies. While these methods produce smooth results, they inherently filter-out part of the dynamics of facial motion, such as high frequency transient movements. In this work, we propose a real-time animation refining system that preserves -- or even restores -- the natural dynamics of facial motions. To do so, we leverage an off-the-shelf recurrent neural network architecture that learns proper facial dynamics patterns on clean animation data. We parametrize our system using the temporal derivatives of the signal, enabling our network to process animations at any framerate. Qualitative results show that our system is able to retrieve natural motion signals from noisy or degraded input animation.Comment: ICGSP 2020: Proceedings of the 2020 The 4th International Conference on Graphics and Signal Processin

    Continuous Audio-Visual Speech Recognition

    Get PDF
    We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audio-visual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal modelling of the acoustic and visual speech signals by applying Multi-Stream hidden Markov models. This approach allows the use of different temporal topologies and levels of stream integration and hence enables to model temporal dependencies more accurately. The system has been evaluated for a continuously spoken digit recognition task of 37 subjects

    Large displacement optical flow

    Full text link
    • …
    corecore