5,750 research outputs found
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and
multi-view-consistent facial images using only collections of single-view 2D
imagery. Towards fine-grained control over facial attributes, recent efforts
incorporate 3D Morphable Face Model (3DMM) to describe deformation in
generative radiance fields either explicitly or implicitly. Explicit methods
provide fine-grained expression control but cannot handle topological changes
caused by hair and accessories, while implicit ones can model varied topologies
but have limited generalization caused by the unconstrained deformation fields.
We propose a novel 3D GAN framework for unsupervised learning of generative,
high-quality and 3D-consistent facial avatars from unstructured 2D images. To
achieve both deformation accuracy and topological flexibility, we propose a 3D
representation called Generative Texture-Rasterized Tri-planes. The proposed
representation learns Generative Neural Textures on top of parametric mesh
templates and then projects them into three orthogonal-viewed feature planes
through rasterization, forming a tri-plane feature representation for volume
rendering. In this way, we combine both fine-grained expression control of
mesh-guided explicit deformation and the flexibility of implicit volumetric
representation. We further propose specific modules for modeling mouth interior
which is not taken into account by 3DMM. Our method demonstrates
state-of-the-art 3D-aware synthesis quality and animation ability through
extensive experiments. Furthermore, serving as 3D prior, our animatable 3D
representation boosts multiple applications including one-shot facial avatars
and 3D-aware stylization.Comment: Project page: https://mrtornado24.github.io/Next3D
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
We present OmniAvatar, a novel geometry-guided 3D head synthesis model
trained from in-the-wild unstructured images that is capable of synthesizing
diverse identity-preserved 3D heads with compelling dynamic details under full
disentangled control over camera poses, facial expressions, head shapes,
articulated neck and jaw poses. To achieve such high level of disentangled
control, we first explicitly define a novel semantic signed distance function
(SDF) around a head geometry (FLAME) conditioned on the control parameters.
This semantic SDF allows us to build a differentiable volumetric correspondence
map from the observation space to a disentangled canonical space from all the
control parameters. We then leverage the 3D-aware GAN framework (EG3D) to
synthesize detailed shape and appearance of 3D full heads in the canonical
space, followed by a volume rendering step guided by the volumetric
correspondence map to output into the observation space. To ensure the control
accuracy on the synthesized head shapes and expressions, we introduce a
geometry prior loss to conform to head SDF and a control loss to conform to the
expression code. Further, we enhance the temporal realism with dynamic details
conditioned upon varying expressions and joint poses. Our model can synthesize
more preferable identity-preserved 3D heads with compelling dynamic details
compared to the state-of-the-art methods both qualitatively and quantitatively.
We also provide an ablation study to justify many of our system design choices
SCULPTOR: Skeleton-Consistent Face Creation Using a Learned Parametric Generator
Recent years have seen growing interest in 3D human faces modelling due to
its wide applications in digital human, character generation and animation.
Existing approaches overwhelmingly emphasized on modeling the exterior shapes,
textures and skin properties of faces, ignoring the inherent correlation
between inner skeletal structures and appearance. In this paper, we present
SCULPTOR, 3D face creations with Skeleton Consistency Using a Learned
Parametric facial generaTOR, aiming to facilitate easy creation of both
anatomically correct and visually convincing face models via a hybrid
parametric-physical representation. At the core of SCULPTOR is LUCY, the first
large-scale shape-skeleton face dataset in collaboration with plastic surgeons.
Named after the fossils of one of the oldest known human ancestors, our LUCY
dataset contains high-quality Computed Tomography (CT) scans of the complete
human head before and after orthognathic surgeries, critical for evaluating
surgery results. LUCY consists of 144 scans of 72 subjects (31 male and 41
female) where each subject has two CT scans taken pre- and post-orthognathic
operations. Based on our LUCY dataset, we learn a novel skeleton consistent
parametric facial generator, SCULPTOR, which can create the unique and nuanced
facial features that help define a character and at the same time maintain
physiological soundness. Our SCULPTOR jointly models the skull, face geometry
and face appearance under a unified data-driven framework, by separating the
depiction of a 3D face into shape blend shape, pose blend shape and facial
expression blend shape. SCULPTOR preserves both anatomic correctness and visual
realism in facial generation tasks compared with existing methods. Finally, we
showcase the robustness and effectiveness of SCULPTOR in various fancy
applications unseen before.Comment: 16 page, 13 fig
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Speech-driven 3D face animation aims to generate realistic facial expressions
that match the speech content and emotion. However, existing methods often
neglect emotional facial expressions or fail to disentangle them from speech
content. To address this issue, this paper proposes an end-to-end neural
network to disentangle different emotions in speech so as to generate rich 3D
facial expressions. Specifically, we introduce the emotion disentangling
encoder (EDE) to disentangle the emotion and content in the speech by
cross-reconstructed speech signals with different emotion labels. Then an
emotion-guided feature fusion decoder is employed to generate a 3D talking face
with enhanced emotion. The decoder is driven by the disentangled identity,
emotional, and content embeddings so as to generate controllable personal and
emotional styles. Finally, considering the scarcity of the 3D emotional talking
face data, we resort to the supervision of facial blendshapes, which enables
the reconstruction of plausible 3D faces from 2D emotional data, and contribute
a large-scale 3D emotional talking face dataset (3D-ETF) to train the network.
Our experiments and user studies demonstrate that our approach outperforms
state-of-the-art methods and exhibits more diverse facial movements. We
recommend watching the supplementary video:
https://ziqiaopeng.github.io/emotalkComment: Accepted by ICCV 202
Three Dimensional Visualization of Fire Spreading Over Forest Landscapes
Previous studies in fire visualization have required high end computer hardware and specialized technical skills. This study demonstrated fire visualization is possible using Visual Nature Studio and standard computer hardware. Elevation and vegetation data were used to create a representation of the New Jersey pine barren environment and a forest compartment within Hobcaw Barony. Photographic images were edited to use as image object models for forest vegetation. The FARSITE fire behavioral model was used to model a fire typical of that area. Output from FARSITE was used to visualize the fire with tree models edited to simulate burning and flame models. Both static and animated views of the fire spread and effects were visualized. The two visualization methods were compared for advantages and disadvantages. VNS visualizations were more realistic, including many effects such as ground textures, lighting, user made models, and atmospheric effects. However the program had higher hardware requirements and sometimes rendered images slowly. ArcScene had lower hardware requirements and produced visualizations with real time movement. The resulting images lacked many of the effects found in VNS and were more simplistic looking
- …