1,085 research outputs found
Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models
The ability to generate diverse 3D articulated head avatars is vital to a
plethora of applications, including augmented reality, cinematography, and
education. Recent work on text-guided 3D object generation has shown great
promise in addressing these needs. These methods directly leverage pre-trained
2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance
fields of generic objects. However, due to the lack of geometry and texture
priors, these methods have limited control over the generated 3D objects,
making it difficult to operate inside a specific domain, e.g., human heads. In
this work, we develop a new approach to text-guided 3D head avatar generation
to address this limitation. Our framework directly operates on the geometry and
texture of an articulable 3D morphable model (3DMM) of a head, and introduces
novel optimization procedures to update the geometry and texture while keeping
the 2D and 3D facial features aligned. The result is a 3D head avatar that is
consistent with the text description and can be readily articulated using the
deformation model of the 3DMM. We show that our diffusion-based articulated
head avatars outperform state-of-the-art approaches for this task. The latter
are typically based on CLIP, which is known to provide limited diversity of
generation and accuracy for 3D object generation.Comment: Project website:
http://www.computationalimaging.org/publications/articulated-diffusion
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
We present OmniAvatar, a novel geometry-guided 3D head synthesis model
trained from in-the-wild unstructured images that is capable of synthesizing
diverse identity-preserved 3D heads with compelling dynamic details under full
disentangled control over camera poses, facial expressions, head shapes,
articulated neck and jaw poses. To achieve such high level of disentangled
control, we first explicitly define a novel semantic signed distance function
(SDF) around a head geometry (FLAME) conditioned on the control parameters.
This semantic SDF allows us to build a differentiable volumetric correspondence
map from the observation space to a disentangled canonical space from all the
control parameters. We then leverage the 3D-aware GAN framework (EG3D) to
synthesize detailed shape and appearance of 3D full heads in the canonical
space, followed by a volume rendering step guided by the volumetric
correspondence map to output into the observation space. To ensure the control
accuracy on the synthesized head shapes and expressions, we introduce a
geometry prior loss to conform to head SDF and a control loss to conform to the
expression code. Further, we enhance the temporal realism with dynamic details
conditioned upon varying expressions and joint poses. Our model can synthesize
more preferable identity-preserved 3D heads with compelling dynamic details
compared to the state-of-the-art methods both qualitatively and quantitatively.
We also provide an ablation study to justify many of our system design choices
{GAN2X}: {N}on-{L}ambertian Inverse Rendering of Image {GANs}
2D images are observations of the 3D physical world depicted with thegeometry, material, and illumination components. Recovering these underlyingintrinsic components from 2D images, also known as inverse rendering, usuallyrequires a supervised setting with paired images collected from multipleviewpoints and lighting conditions, which is resource-demanding. In this work,we present GAN2X, a new method for unsupervised inverse rendering that onlyuses unpaired images for training. Unlike previous Shape-from-GAN approachesthat mainly focus on 3D shapes, we take the first attempt to also recovernon-Lambertian material properties by exploiting the pseudo paired datagenerated by a GAN. To achieve precise inverse rendering, we devise aspecularity-aware neural surface representation that continuously models thegeometry and material properties. A shading-based refinement technique isadopted to further distill information in the target image and recover morefine details. Experiments demonstrate that GAN2X can accurately decompose 2Dimages to 3D shape, albedo, and specular properties for different objectcategories, and achieves the state-of-the-art performance for unsupervisedsingle-view 3D face reconstruction. We also show its applications in downstreamtasks including real image editing and lifting 2D GANs to decomposed 3D GANs.<br
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and
multi-view-consistent facial images using only collections of single-view 2D
imagery. Towards fine-grained control over facial attributes, recent efforts
incorporate 3D Morphable Face Model (3DMM) to describe deformation in
generative radiance fields either explicitly or implicitly. Explicit methods
provide fine-grained expression control but cannot handle topological changes
caused by hair and accessories, while implicit ones can model varied topologies
but have limited generalization caused by the unconstrained deformation fields.
We propose a novel 3D GAN framework for unsupervised learning of generative,
high-quality and 3D-consistent facial avatars from unstructured 2D images. To
achieve both deformation accuracy and topological flexibility, we propose a 3D
representation called Generative Texture-Rasterized Tri-planes. The proposed
representation learns Generative Neural Textures on top of parametric mesh
templates and then projects them into three orthogonal-viewed feature planes
through rasterization, forming a tri-plane feature representation for volume
rendering. In this way, we combine both fine-grained expression control of
mesh-guided explicit deformation and the flexibility of implicit volumetric
representation. We further propose specific modules for modeling mouth interior
which is not taken into account by 3DMM. Our method demonstrates
state-of-the-art 3D-aware synthesis quality and animation ability through
extensive experiments. Furthermore, serving as 3D prior, our animatable 3D
representation boosts multiple applications including one-shot facial avatars
and 3D-aware stylization.Comment: Project page: https://mrtornado24.github.io/Next3D
Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models
We propose a 3D generation pipeline that uses diffusion models to generate
realistic human digital avatars. Due to the wide variety of human identities,
poses, and stochastic details, the generation of 3D human meshes has been a
challenging problem. To address this, we decompose the problem into 2D normal
map generation and normal map-based 3D reconstruction. Specifically, we first
simultaneously generate realistic normal maps for the front and backside of a
clothed human, dubbed dual normal maps, using a pose-conditional diffusion
model. For 3D reconstruction, we ``carve'' the prior SMPL-X mesh to a detailed
3D mesh according to the normal maps through mesh optimization. To further
enhance the high-frequency details, we present a diffusion resampling scheme on
both body and facial regions, thus encouraging the generation of realistic
digital avatars. We also seamlessly incorporate a recent text-to-image
diffusion model to support text-based human identity control. Our method,
namely, Chupa, is capable of generating realistic 3D clothed humans with better
perceptual quality and identity variety
Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator
3D-aware image synthesis aims at learning a generative model that can render
photo-realistic 2D images while capturing decent underlying 3D shapes. A
popular solution is to adopt the generative adversarial network (GAN) and
replace the generator with a 3D renderer, where volume rendering with neural
radiance field (NeRF) is commonly used. Despite the advancement of synthesis
quality, existing methods fail to obtain moderate 3D shapes. We argue that,
considering the two-player game in the formulation of GANs, only making the
generator 3D-aware is not enough. In other words, displacing the generative
mechanism only offers the capability, but not the guarantee, of producing
3D-aware images, because the supervision of the generator primarily comes from
the discriminator. To address this issue, we propose GeoD through learning a
geometry-aware discriminator to improve 3D-aware GANs. Concretely, besides
differentiating real and fake samples from the 2D image space, the
discriminator is additionally asked to derive the geometry information from the
inputs, which is then applied as the guidance of the generator. Such a simple
yet effective design facilitates learning substantially more accurate 3D
shapes. Extensive experiments on various generator architectures and training
datasets verify the superiority of GeoD over state-of-the-art alternatives.
Moreover, our approach is registered as a general framework such that a more
capable discriminator (i.e., with a third task of novel view synthesis beyond
domain classification and geometry extraction) can further assist the generator
with a better multi-view consistency.Comment: Accepted by NeurIPS 2022. Project page:
https://vivianszf.github.io/geo
Deep Generative Models on 3D Representations: A Survey
Generative models, as an important family of statistical modeling, target
learning the observed data distribution via generating new instances. Along
with the rise of neural networks, deep generative models, such as variational
autoencoders (VAEs) and generative adversarial network (GANs), have made
tremendous progress in 2D image synthesis. Recently, researchers switch their
attentions from the 2D space to the 3D space considering that 3D data better
aligns with our physical world and hence enjoys great potential in practice.
However, unlike a 2D image, which owns an efficient representation (i.e., pixel
grid) by nature, representing 3D data could face far more challenges.
Concretely, we would expect an ideal 3D representation to be capable enough to
model shapes and appearances in details, and to be highly efficient so as to
model high-resolution data with fast speed and low memory cost. However,
existing 3D representations, such as point clouds, meshes, and recent neural
fields, usually fail to meet the above requirements simultaneously. In this
survey, we make a thorough review of the development of 3D generation,
including 3D shape generation and 3D-aware image synthesis, from the
perspectives of both algorithms and more importantly representations. We hope
that our discussion could help the community track the evolution of this field
and further spark some innovative ideas to advance this challenging task
State of the Art on Neural Rendering
Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
We present a method to efficiently generate 3D-aware high-resolution images
that are view-consistent across multiple target views. The proposed multiplane
neural radiance model, named GMNR, consists of a novel {\alpha}-guided
view-dependent representation ({\alpha}-VdR) module for learning view-dependent
information. The {\alpha}-VdR module, faciliated by an {\alpha}-guided pixel
sampling technique, computes the view-dependent representation efficiently by
learning viewing direction and position coefficients. Moreover, we propose a
view-consistency loss to enforce photometric similarity across multiple views.
The GMNR model can generate 3D-aware high-resolution images that are
viewconsistent across multiple camera poses, while maintaining the
computational efficiency in terms of both training and inference time.
Experiments on three datasets demonstrate the effectiveness of the proposed
modules, leading to favorable results in terms of both generation quality and
inference time, compared to existing approaches. Our GMNR model generates
3D-aware images of 1024 X 1024 pixels with 17.6 FPS on a single V100. Code :
https://github.com/VIROBO-15/GMNRComment: Technical repor
Survey on Controlable Image Synthesis with Deep Learning
Image synthesis has attracted emerging research interests in academic and
industry communities. Deep learning technologies especially the generative
models greatly inspired controllable image synthesis approaches and
applications, which aim to generate particular visual contents with latent
prompts. In order to further investigate low-level controllable image synthesis
problem which is crucial for fine image rendering and editing tasks, we present
a survey of some recent works on 3D controllable image synthesis using deep
learning. We first introduce the datasets and evaluation indicators for 3D
controllable image synthesis. Then, we review the state-of-the-art research for
geometrically controllable image synthesis in two aspects: 1)
Viewpoint/pose-controllable image synthesis; 2) Structure/shape-controllable
image synthesis. Furthermore, the photometrically controllable image synthesis
approaches are also reviewed for 3D re-lighting researches. While the emphasis
is on 3D controllable image synthesis algorithms, the related applications,
products and resources are also briefly summarized for practitioners.Comment: 19 pages, 17 figure
- …