857 research outputs found
High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field
One crucial aspect of 3D head avatar reconstruction lies in the details of
facial expressions. Although recent NeRF-based photo-realistic 3D head avatar
methods achieve high-quality avatar rendering, they still encounter challenges
retaining intricate facial expression details because they overlook the
potential of specific expression variations at different spatial positions when
conditioning the radiance field. Motivated by this observation, we introduce a
novel Spatially-Varying Expression (SVE) conditioning. The SVE can be obtained
by a simple MLP-based generation network, encompassing both spatial positional
features and global expression information. Benefiting from rich and diverse
information of the SVE at different positions, the proposed SVE-conditioned
neural radiance field can deal with intricate facial expressions and achieve
realistic rendering and geometry details of high-fidelity 3D head avatars.
Additionally, to further elevate the geometric and rendering quality, we
introduce a new coarse-to-fine training strategy, including a geometry
initialization strategy at the coarse stage and an adaptive importance sampling
strategy at the fine stage. Extensive experiments indicate that our method
outperforms other state-of-the-art (SOTA) methods in rendering and geometry
quality on mobile phone-collected and public datasets.Comment: 9 pages, 5 figure
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
Synthesizing high-fidelity head avatars is a central problem for computer
vision and graphics. While head avatar synthesis algorithms have advanced
rapidly, the best ones still face great obstacles in real-world scenarios. One
of the vital causes is inadequate datasets -- 1) current public datasets can
only support researchers to explore high-fidelity head avatars in one or two
task directions; 2) these datasets usually contain digital head assets with
limited data volume, and narrow distribution over different attributes. In this
paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive
advance in head avatar research. It contains massive data assets, with 243+
million complete head frames, and over 800k video sequences from 500 different
identities captured by synchronized multi-view cameras at 30 FPS. It is a
large-scale digital library for head avatars with three key attributes: 1) High
Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K
cameras in 360 degrees. 2) High Diversity: The collected subjects vary from
different ages, eras, ethnicities, and cultures, providing abundant materials
with distinctive styles in appearance and geometry. Moreover, each subject is
asked to perform various motions, such as expressions and head rotations, which
further extend the richness of assets. 3) Rich Annotations: we provide
annotations with different granularities: cameras' parameters, matting, scan,
2D/3D facial landmarks, FLAME fitting, and text description.
Based on the dataset, we build a comprehensive benchmark for head avatar
research, with 16 state-of-the-art methods performed on five main tasks: novel
view synthesis, novel expression synthesis, hair rendering, hair editing, and
talking head generation. Our experiments uncover the strengths and weaknesses
of current methods. RenderMe-360 opens the door for future exploration in head
avatars.Comment: Technical Report; Project Page: 36; Github Link:
https://github.com/RenderMe-360/RenderMe-36
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
Talking head generation aims to generate faces that maintain the identity
information of the source image and imitate the motion of the driving image.
Most pioneering methods rely primarily on 2D representations and thus will
inevitably suffer from face distortion when large head rotations are
encountered. Recent works instead employ explicit 3D structural representations
or implicit neural rendering to improve performance under large pose changes.
Nevertheless, the fidelity of identity and expression is not so desirable,
especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which
achieves high-fidelity and free-view talking-head synthesis. Drawing on the
recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the
3D dynamic scene into a canonical appearance field and an implicit deformation
field, where the former comprises the canonical source face and the latter
models the driving pose and expression. In particular, we improve fidelity from
two aspects: (i) to enhance identity expressiveness, we design a generalized
appearance module that leverages multi-scale volume features to preserve face
shape and details; (ii) to improve expression preciseness, we propose a
lightweight deformation module that explicitly decouples the pose and
expression to enable precise expression modeling. Extensive experiments
demonstrate that our proposed approach can generate better results than
previous works. Project page: https://www.waytron.net/hidenerf/Comment: Accepted by CVPR 202
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and
multi-view-consistent facial images using only collections of single-view 2D
imagery. Towards fine-grained control over facial attributes, recent efforts
incorporate 3D Morphable Face Model (3DMM) to describe deformation in
generative radiance fields either explicitly or implicitly. Explicit methods
provide fine-grained expression control but cannot handle topological changes
caused by hair and accessories, while implicit ones can model varied topologies
but have limited generalization caused by the unconstrained deformation fields.
We propose a novel 3D GAN framework for unsupervised learning of generative,
high-quality and 3D-consistent facial avatars from unstructured 2D images. To
achieve both deformation accuracy and topological flexibility, we propose a 3D
representation called Generative Texture-Rasterized Tri-planes. The proposed
representation learns Generative Neural Textures on top of parametric mesh
templates and then projects them into three orthogonal-viewed feature planes
through rasterization, forming a tri-plane feature representation for volume
rendering. In this way, we combine both fine-grained expression control of
mesh-guided explicit deformation and the flexibility of implicit volumetric
representation. We further propose specific modules for modeling mouth interior
which is not taken into account by 3DMM. Our method demonstrates
state-of-the-art 3D-aware synthesis quality and animation ability through
extensive experiments. Furthermore, serving as 3D prior, our animatable 3D
representation boosts multiple applications including one-shot facial avatars
and 3D-aware stylization.Comment: Project page: https://mrtornado24.github.io/Next3D
Implicit Neural Head Synthesis via Controllable Local Deformation Fields
High-quality reconstruction of controllable 3D head avatars from 2D videos is
highly desirable for virtual human applications in movies, games, and
telepresence. Neural implicit fields provide a powerful representation to model
3D head avatars with personalized shape, expressions, and facial parts, e.g.,
hair and mouth interior, that go beyond the linear 3D morphable model (3DMM).
However, existing methods do not model faces with fine-scale facial features,
or local control of facial parts that extrapolate asymmetric expressions from
monocular videos. Further, most condition only on 3DMM parameters with poor(er)
locality, and resolve local features with a global neural field. We build on
part-based implicit shape models that decompose a global deformation field into
local ones. Our novel formulation models multiple implicit deformation fields
with local semantic rig-like control via 3DMM-based parameters, and
representative facial landmarks. Further, we propose a local control loss and
attention mask mechanism that promote sparsity of each learned deformation
field. Our formulation renders sharper locally controllable nonlinear
deformations than previous implicit monocular approaches, especially mouth
interior, asymmetric expressions, and facial details.Comment: Accepted at CVPR 202
- …