Search CORE

5,750 research outputs found

Capture, Learning, and Synthesis of 3D Speaking Styles

Author: Black Michael J.
Bolkart Timo
Cudeiro Daniel
Laidlaw Cassidy
Ranjan Anurag
Publication venue
Publication date: 01/01/2019
Field of study

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

Author: Li Xiaoyu
Liu Yebin
Sun Jingxiang
Wang Lizhen
Wang Xuan
Zhang Hongwen
Zhang Yong
Publication venue
Publication date: 21/11/2022
Field of study

3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Towards fine-grained control over facial attributes, recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly. Explicit methods provide fine-grained expression control but cannot handle topological changes caused by hair and accessories, while implicit ones can model varied topologies but have limited generalization caused by the unconstrained deformation fields. We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images. To achieve both deformation accuracy and topological flexibility, we propose a 3D representation called Generative Texture-Rasterized Tri-planes. The proposed representation learns Generative Neural Textures on top of parametric mesh templates and then projects them into three orthogonal-viewed feature planes through rasterization, forming a tri-plane feature representation for volume rendering. In this way, we combine both fine-grained expression control of mesh-guided explicit deformation and the flexibility of implicit volumetric representation. We further propose specific modules for modeling mouth interior which is not taken into account by 3DMM. Our method demonstrates state-of-the-art 3D-aware synthesis quality and animation ability through extensive experiments. Furthermore, serving as 3D prior, our animatable 3D representation boosts multiple applications including one-shot facial avatars and 3D-aware stylization.Comment: Project page: https://mrtornado24.github.io/Next3D

arXiv.org e-Print Archive

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

Author: Feng Jiashi
Jiang Zihang
Liu Jing
Luo Linjie
Ma Wanchun
Shi Yichun
Song Guoxian
Xu Hongyi
Zhang Jianfeng
Publication venue
Publication date: 27/03/2023
Field of study

We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. To achieve such high level of disentangled control, we first explicitly define a novel semantic signed distance function (SDF) around a head geometry (FLAME) conditioned on the control parameters. This semantic SDF allows us to build a differentiable volumetric correspondence map from the observation space to a disentangled canonical space from all the control parameters. We then leverage the 3D-aware GAN framework (EG3D) to synthesize detailed shape and appearance of 3D full heads in the canonical space, followed by a volume rendering step guided by the volumetric correspondence map to output into the observation space. To ensure the control accuracy on the synthesized head shapes and expressions, we introduce a geometry prior loss to conform to head SDF and a control loss to conform to the expression code. Further, we enhance the temporal realism with dynamic details conditioned upon varying expressions and joint poses. Our model can synthesize more preferable identity-preserved 3D heads with compelling dynamic details compared to the state-of-the-art methods both qualitatively and quantitatively. We also provide an ablation study to justify many of our system design choices

arXiv.org e-Print Archive

SCULPTOR: Skeleton-Consistent Face Creation Using a Learned Parametric Generator

Author: He Dongming
Li Yuwei
Qiu Zesong
Wang Jingya
Wang Xudong
Xu Lan
Yu Jingyi
Zhang Longwen
Zhang Qixuan
Zhang Yinghao
Zhang Yuyao
Publication venue
Publication date: 02/10/2022
Field of study

Recent years have seen growing interest in 3D human faces modelling due to its wide applications in digital human, character generation and animation. Existing approaches overwhelmingly emphasized on modeling the exterior shapes, textures and skin properties of faces, ignoring the inherent correlation between inner skeletal structures and appearance. In this paper, we present SCULPTOR, 3D face creations with Skeleton Consistency Using a Learned Parametric facial generaTOR, aiming to facilitate easy creation of both anatomically correct and visually convincing face models via a hybrid parametric-physical representation. At the core of SCULPTOR is LUCY, the first large-scale shape-skeleton face dataset in collaboration with plastic surgeons. Named after the fossils of one of the oldest known human ancestors, our LUCY dataset contains high-quality Computed Tomography (CT) scans of the complete human head before and after orthognathic surgeries, critical for evaluating surgery results. LUCY consists of 144 scans of 72 subjects (31 male and 41 female) where each subject has two CT scans taken pre- and post-orthognathic operations. Based on our LUCY dataset, we learn a novel skeleton consistent parametric facial generator, SCULPTOR, which can create the unique and nuanced facial features that help define a character and at the same time maintain physiological soundness. Our SCULPTOR jointly models the skull, face geometry and face appearance under a unified data-driven framework, by separating the depiction of a 3D face into shape blend shape, pose blend shape and facial expression blend shape. SCULPTOR preserves both anatomic correctness and visual realism in facial generation tasks compared with existing methods. Finally, we showcase the robustness and effectiveness of SCULPTOR in various fancy applications unseen before.Comment: 16 page, 13 fig

arXiv.org e-Print Archive

EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation

Author: Fan Zhaoxin
He Jun
Liu Hongyan
Peng Ziqiao
Song Zhenbo
Wu Haoyu
Xu Hao
Zhu Xiangyu
Publication venue
Publication date: 25/08/2023
Field of study

Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the speech by cross-reconstructed speech signals with different emotion labels. Then an emotion-guided feature fusion decoder is employed to generate a 3D talking face with enhanced emotion. The decoder is driven by the disentangled identity, emotional, and content embeddings so as to generate controllable personal and emotional styles. Finally, considering the scarcity of the 3D emotional talking face data, we resort to the supervision of facial blendshapes, which enables the reconstruction of plausible 3D faces from 2D emotional data, and contribute a large-scale 3D emotional talking face dataset (3D-ETF) to train the network. Our experiments and user studies demonstrate that our approach outperforms state-of-the-art methods and exhibits more diverse facial movements. We recommend watching the supplementary video: https://ziqiaopeng.github.io/emotalkComment: Accepted by ICCV 202

arXiv.org e-Print Archive

Three Dimensional Visualization of Fire Spreading Over Forest Landscapes

Author: Williams Brian
Publication venue: Clemson University Libraries
Publication date: 01/12/2008
Field of study

Previous studies in fire visualization have required high end computer hardware and specialized technical skills. This study demonstrated fire visualization is possible using Visual Nature Studio and standard computer hardware. Elevation and vegetation data were used to create a representation of the New Jersey pine barren environment and a forest compartment within Hobcaw Barony. Photographic images were edited to use as image object models for forest vegetation. The FARSITE fire behavioral model was used to model a fire typical of that area. Output from FARSITE was used to visualize the fire with tree models edited to simulate burning and flame models. Both static and animated views of the fire spread and effects were visualized. The two visualization methods were compared for advantages and disadvantages. VNS visualizations were more realistic, including many effects such as ground textures, lighting, user made models, and atmospheric effects. However the program had higher hardware requirements and sometimes rendered images slowly. ArcScene had lower hardware requirements and produced visualizations with real time movement. The resulting images lacked many of the effects found in VNS and were more simplistic looking

Clemson University: TigerPrints