61 research outputs found
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
We present OmniAvatar, a novel geometry-guided 3D head synthesis model
trained from in-the-wild unstructured images that is capable of synthesizing
diverse identity-preserved 3D heads with compelling dynamic details under full
disentangled control over camera poses, facial expressions, head shapes,
articulated neck and jaw poses. To achieve such high level of disentangled
control, we first explicitly define a novel semantic signed distance function
(SDF) around a head geometry (FLAME) conditioned on the control parameters.
This semantic SDF allows us to build a differentiable volumetric correspondence
map from the observation space to a disentangled canonical space from all the
control parameters. We then leverage the 3D-aware GAN framework (EG3D) to
synthesize detailed shape and appearance of 3D full heads in the canonical
space, followed by a volume rendering step guided by the volumetric
correspondence map to output into the observation space. To ensure the control
accuracy on the synthesized head shapes and expressions, we introduce a
geometry prior loss to conform to head SDF and a control loss to conform to the
expression code. Further, we enhance the temporal realism with dynamic details
conditioned upon varying expressions and joint poses. Our model can synthesize
more preferable identity-preserved 3D heads with compelling dynamic details
compared to the state-of-the-art methods both qualitatively and quantitatively.
We also provide an ablation study to justify many of our system design choices
Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields
Capitalizing on the recent advances in image generation models, existing
controllable face image synthesis methods are able to generate high-fidelity
images with some levels of controllability, e.g., controlling the shapes,
expressions, textures, and poses of the generated face images. However, these
methods focus on 2D image generative models, which are prone to producing
inconsistent face images under large expression and pose changes. In this
paper, we propose a new NeRF-based conditional 3D face synthesis framework,
which enables 3D controllability over the generated face images by imposing
explicit 3D conditions from 3D face priors. At its core is a conditional
Generative Occupancy Field (cGOF) that effectively enforces the shape of the
generated face to commit to a given 3D Morphable Model (3DMM) mesh. To achieve
accurate control over fine-grained 3D face shapes of the synthesized image, we
additionally incorporate a 3D landmark loss as well as a volume warping loss
into our synthesis algorithm. Experiments validate the effectiveness of the
proposed method, which is able to generate high-fidelity face images and shows
more precise 3D controllability than state-of-the-art 2D-based controllable
face synthesis methods. Find code and demo at
https://keqiangsun.github.io/projects/cgof
Face editing with GAN -- A Review
In recent years, Generative Adversarial Networks (GANs) have become a hot
topic among researchers and engineers that work with deep learning. It has been
a ground-breaking technique which can generate new pieces of content of data in
a consistent way. The topic of GANs has exploded in popularity due to its
applicability in fields like image generation and synthesis, and music
production and composition. GANs have two competing neural networks: a
generator and a discriminator. The generator is used to produce new samples or
pieces of content, while the discriminator is used to recognize whether the
piece of content is real or generated. What makes it different from other
generative models is its ability to learn unlabeled samples. In this review
paper, we will discuss the evolution of GANs, several improvements proposed by
the authors and a brief comparison between the different models. Index Terms
generative adversarial networks, unsupervised learning, deep learning
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
To serve the intricate and varied demands of image editing, precise and
flexible manipulation in image content is indispensable. Recently, Drag-based
editing methods have gained impressive performance. However, these methods
predominantly center on point dragging, resulting in two noteworthy drawbacks,
namely "miss tracking", where difficulties arise in accurately tracking the
predetermined handle points, and "ambiguous tracking", where tracked points are
potentially positioned in wrong regions that closely resemble the handle
points. To address the above issues, we propose FreeDrag, a feature dragging
methodology designed to free the burden on point tracking. The FreeDrag
incorporates two key designs, i.e., template feature via adaptive updating and
line search with backtracking, the former improves the stability against
drastic content change by elaborately controls feature updating scale after
each dragging, while the latter alleviates the misguidance from similar points
by actively restricting the search area in a line. These two technologies
together contribute to a more stable semantic dragging with higher efficiency.
Comprehensive experimental results substantiate that our approach significantly
outperforms pre-existing methodologies, offering reliable point-based editing
even in various complex scenarios.Comment: 13 pages, 14 figure
FPGAN-Control: A Controllable Fingerprint Generator for Training with Synthetic Data
Training fingerprint recognition models using synthetic data has recently
gained increased attention in the biometric community as it alleviates the
dependency on sensitive personal data. Existing approaches for fingerprint
generation are limited in their ability to generate diverse impressions of the
same finger, a key property for providing effective data for training
recognition models. To address this gap, we present FPGAN-Control, an identity
preserving image generation framework which enables control over the
fingerprint's image appearance (e.g., fingerprint type, acquisition device,
pressure level) of generated fingerprints. We introduce a novel appearance loss
that encourages disentanglement between the fingerprint's identity and
appearance properties. In our experiments, we used the publicly available NIST
SD302 (N2N) dataset for training the FPGAN-Control model. We demonstrate the
merits of FPGAN-Control, both quantitatively and qualitatively, in terms of
identity preservation level, degree of appearance control, and low
synthetic-to-real domain gap. Finally, training recognition models using only
synthetic datasets generated by FPGAN-Control lead to recognition accuracies
that are on par or even surpass models trained using real data. To the best of
our knowledge, this is the first work to demonstrate this
FDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models
The ability to create high-quality 3D faces from a single image has become
increasingly important with wide applications in video conferencing, AR/VR, and
advanced video editing in movie industries. In this paper, we propose Face
Diffusion NeRF (FDNeRF), a new generative method to reconstruct high-quality
Face NeRFs from single images, complete with semantic editing and relighting
capabilities. FDNeRF utilizes high-resolution 3D GAN inversion and expertly
trained 2D latent-diffusion model, allowing users to manipulate and construct
Face NeRFs in zero-shot learning without the need for explicit 3D data. With
carefully designed illumination and identity preserving loss, as well as
multi-modal pre-training, FD-NeRF offers users unparalleled control over the
editing process enabling them to create and edit face NeRFs using just
single-view images, text prompts, and explicit target lighting. The advanced
features of FDNeRF have been designed to produce more impressive results than
existing 2D editing approaches that rely on 2D segmentation maps for editable
attributes. Experiments show that our FDNeRF achieves exceptionally realistic
results and unprecedented flexibility in editing compared with state-of-the-art
3D face reconstruction and editing methods. Our code will be available at
https://github.com/BillyXYB/FDNeRF
LatentSwap3D: Semantic Edits on 3D Image GANs
3D GANs have the ability to generate latent codes for entire 3D volumes
rather than only 2D images. These models offer desirable features like
high-quality geometry and multi-view consistency, but, unlike their 2D
counterparts, complex semantic image editing tasks for 3D GANs have only been
partially explored. To address this problem, we propose LatentSwap3D, a
semantic edit approach based on latent space discovery that can be used with
any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on
identifying the latent code dimensions corresponding to specific attributes by
feature ranking using a random forest classifier. It then performs the edit by
swapping the selected dimensions of the image being edited with the ones from
an automatically selected reference image. Compared to other latent space
control-based edit methods, which were mainly designed for 2D GANs, our method
on 3D GANs provides remarkably consistent semantic edits in a disentangled
manner and outperforms others both qualitatively and quantitatively. We show
results on seven 3D GANs (pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF,
and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).Comment: The paper has been accepted by ICCV'23 AI3DC
'Tax-free' 3DMM Conditional Face Generation
3DMM conditioned face generation has gained traction due to its well-defined
controllability; however, the trade-off is lower sample quality: Previous works
such as DiscoFaceGAN and 3D-FM GAN show a significant FID gap compared to the
unconditional StyleGAN, suggesting that there is a quality tax to pay for
controllability. In this paper, we challenge the assumption that quality and
controllability cannot coexist. To pinpoint the previous issues, we
mathematically formalize the problem of 3DMM conditioned face generation. Then,
we devise simple solutions to the problem under our proposed framework. This
results in a new model that effectively removes the quality tax between 3DMM
conditioned face GANs and the unconditional StyleGAN.Comment: Accepted to the AI for Content Creation Workshop at CVPR 202
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and
multi-view-consistent facial images using only collections of single-view 2D
imagery. Towards fine-grained control over facial attributes, recent efforts
incorporate 3D Morphable Face Model (3DMM) to describe deformation in
generative radiance fields either explicitly or implicitly. Explicit methods
provide fine-grained expression control but cannot handle topological changes
caused by hair and accessories, while implicit ones can model varied topologies
but have limited generalization caused by the unconstrained deformation fields.
We propose a novel 3D GAN framework for unsupervised learning of generative,
high-quality and 3D-consistent facial avatars from unstructured 2D images. To
achieve both deformation accuracy and topological flexibility, we propose a 3D
representation called Generative Texture-Rasterized Tri-planes. The proposed
representation learns Generative Neural Textures on top of parametric mesh
templates and then projects them into three orthogonal-viewed feature planes
through rasterization, forming a tri-plane feature representation for volume
rendering. In this way, we combine both fine-grained expression control of
mesh-guided explicit deformation and the flexibility of implicit volumetric
representation. We further propose specific modules for modeling mouth interior
which is not taken into account by 3DMM. Our method demonstrates
state-of-the-art 3D-aware synthesis quality and animation ability through
extensive experiments. Furthermore, serving as 3D prior, our animatable 3D
representation boosts multiple applications including one-shot facial avatars
and 3D-aware stylization.Comment: Project page: https://mrtornado24.github.io/Next3D
- …