1,180 research outputs found
Head Tracking via Robust Registration in Texture Map Images
A novel method for 3D head tracking in the presence of large head rotations and facial expression changes is described. Tracking is formulated in terms of color image registration in the texture map of a 3D surface model. Model appearance is recursively updated via image mosaicking in the texture map as the head orientation varies. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. Parameters are estimated via a robust minimization procedure; this provides robustness to occlusions, wrinkles, shadows, and specular highlights. The system was tested on a variety of sequences taken with low quality, uncalibrated video cameras. Experimental results are reported
Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos
We propose a novel deep multi-modality neural network for restoring very low
bit rate videos of talking heads. Such video contents are very common in social
media, teleconferencing, distance education, tele-medicine, etc., and often
need to be transmitted with limited bandwidth. The proposed CNN method exploits
the correlations among three modalities, video, audio and emotion state of the
speaker, to remove the video compression artifacts caused by spatial down
sampling and quantization. The deep learning approach turns out to be ideally
suited for the video restoration task, as the complex non-linear cross-modality
correlations are very difficult to model analytically and explicitly. The new
method is a video post processor that can significantly boost the perceptual
quality of aggressively compressed talking head videos, while being fully
compatible with all existing video compression standards.Comment: Accepted by Proceedings of the 28th ACM International Conference on
Multimedia(ACM MM),202
High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field
One crucial aspect of 3D head avatar reconstruction lies in the details of
facial expressions. Although recent NeRF-based photo-realistic 3D head avatar
methods achieve high-quality avatar rendering, they still encounter challenges
retaining intricate facial expression details because they overlook the
potential of specific expression variations at different spatial positions when
conditioning the radiance field. Motivated by this observation, we introduce a
novel Spatially-Varying Expression (SVE) conditioning. The SVE can be obtained
by a simple MLP-based generation network, encompassing both spatial positional
features and global expression information. Benefiting from rich and diverse
information of the SVE at different positions, the proposed SVE-conditioned
neural radiance field can deal with intricate facial expressions and achieve
realistic rendering and geometry details of high-fidelity 3D head avatars.
Additionally, to further elevate the geometric and rendering quality, we
introduce a new coarse-to-fine training strategy, including a geometry
initialization strategy at the coarse stage and an adaptive importance sampling
strategy at the fine stage. Extensive experiments indicate that our method
outperforms other state-of-the-art (SOTA) methods in rendering and geometry
quality on mobile phone-collected and public datasets.Comment: 9 pages, 5 figure
- …