94 research outputs found
Real-Time Cleaning and Refinement of Facial Animation Signals
With the increasing demand for real-time animated 3D content in the
entertainment industry and beyond, performance-based animation has garnered
interest among both academic and industrial communities. While recent solutions
for motion-capture animation have achieved impressive results, handmade
post-processing is often needed, as the generated animations often contain
artifacts. Existing real-time motion capture solutions have opted for standard
signal processing methods to strengthen temporal coherence of the resulting
animations and remove inaccuracies. While these methods produce smooth results,
they inherently filter-out part of the dynamics of facial motion, such as high
frequency transient movements. In this work, we propose a real-time animation
refining system that preserves -- or even restores -- the natural dynamics of
facial motions. To do so, we leverage an off-the-shelf recurrent neural network
architecture that learns proper facial dynamics patterns on clean animation
data. We parametrize our system using the temporal derivatives of the signal,
enabling our network to process animations at any framerate. Qualitative
results show that our system is able to retrieve natural motion signals from
noisy or degraded input animation.Comment: ICGSP 2020: Proceedings of the 2020 The 4th International Conference
on Graphics and Signal Processin
Continuous Audio-Visual Speech Recognition
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audio-visual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal modelling of the acoustic and visual speech signals by applying Multi-Stream hidden Markov models. This approach allows the use of different temporal topologies and levels of stream integration and hence enables to model temporal dependencies more accurately. The system has been evaluated for a continuously spoken digit recognition task of 37 subjects
Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation
- …