115 research outputs found
Real-Time Cleaning and Refinement of Facial Animation Signals
With the increasing demand for real-time animated 3D content in the
entertainment industry and beyond, performance-based animation has garnered
interest among both academic and industrial communities. While recent solutions
for motion-capture animation have achieved impressive results, handmade
post-processing is often needed, as the generated animations often contain
artifacts. Existing real-time motion capture solutions have opted for standard
signal processing methods to strengthen temporal coherence of the resulting
animations and remove inaccuracies. While these methods produce smooth results,
they inherently filter-out part of the dynamics of facial motion, such as high
frequency transient movements. In this work, we propose a real-time animation
refining system that preserves -- or even restores -- the natural dynamics of
facial motions. To do so, we leverage an off-the-shelf recurrent neural network
architecture that learns proper facial dynamics patterns on clean animation
data. We parametrize our system using the temporal derivatives of the signal,
enabling our network to process animations at any framerate. Qualitative
results show that our system is able to retrieve natural motion signals from
noisy or degraded input animation.Comment: ICGSP 2020: Proceedings of the 2020 The 4th International Conference
on Graphics and Signal Processin
Motion Capture of Hands in Action Using Discriminative Salient Points
Abstract. Capturing the motion of two hands interacting with an object is a very challenging task due to the large number of degrees of freedom, self-occlusions, and similarity between the fingers, even in the case of multiple cameras observing the scene. In this paper we propose to use discriminatively learned salient points on the fingers and to estimate the finger-salient point associations simultaneously with the estimation of the hand pose. We introduce a differentiable objective function that also takes edges, optical flow and collisions into account. Our qualitative and quantitative evaluations show that the proposed approach achieves very accurate results for several challenging sequences containing hands and objects in action.
Continuous Audio-Visual Speech Recognition
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audio-visual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal modelling of the acoustic and visual speech signals by applying Multi-Stream hidden Markov models. This approach allows the use of different temporal topologies and levels of stream integration and hence enables to model temporal dependencies more accurately. The system has been evaluated for a continuously spoken digit recognition task of 37 subjects
- …