21 research outputs found
Photo-Realistic Facial Details Synthesis from Single Image
We present a single-image 3D face synthesis technique that can handle
challenging facial expressions while recovering fine geometric details. Our
technique employs expression analysis for proxy face geometry generation and
combines supervised and unsupervised learning for facial detail synthesis. On
proxy generation, we conduct emotion prediction to determine a new
expression-informed proxy. On detail synthesis, we present a Deep Facial Detail
Net (DFDN) based on Conditional Generative Adversarial Net (CGAN) that employs
both geometry and appearance loss functions. For geometry, we capture 366
high-quality 3D scans from 122 different subjects under 3 facial expressions.
For appearance, we use additional 20K in-the-wild face images and apply
image-based rendering to accommodate lighting variations. Comprehensive
experiments demonstrate that our framework can produce high-quality 3D faces
with realistic details under challenging facial expressions
PREF: Phasorial Embedding Fields for Compact Neural Representations
We present an efficient frequency-based neural representation termed PREF: a
shallow MLP augmented with a phasor volume that covers significant border
spectra than previous Fourier feature mapping or Positional Encoding. At the
core is our compact 3D phasor volume where frequencies distribute uniformly
along a 2D plane and dilate along a 1D axis. To this end, we develop a tailored
and efficient Fourier transform that combines both Fast Fourier transform and
local interpolation to accelerate na\"ive Fourier mapping. We also introduce a
Parsvel regularizer that stables frequency-based learning. In these ways, Our
PREF reduces the costly MLP in the frequency-based representation, thereby
significantly closing the efficiency gap between it and other hybrid
representations, and improving its interpretability. Comprehensive experiments
demonstrate that our PREF is able to capture high-frequency details while
remaining compact and robust, including 2D image generalization, 3D signed
distance function regression and 5D neural radiance field reconstruction
Mip-Splatting: Alias-free 3D Gaussian Splatting
Recently, 3D Gaussian Splatting has demonstrated impressive novel view
synthesis results, reaching high fidelity and efficiency. However, strong
artifacts can be observed when changing the sampling rate, \eg, by changing
focal length or camera distance. We find that the source for this phenomenon
can be attributed to the lack of 3D frequency constraints and the usage of a 2D
dilation filter. To address this problem, we introduce a 3D smoothing filter
which constrains the size of the 3D Gaussian primitives based on the maximal
sampling frequency induced by the input views, eliminating high-frequency
artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip
filter, which simulates a 2D box filter, effectively mitigates aliasing and
dilation issues. Our evaluation, including scenarios such a training on
single-scale images and testing on multiple scales, validates the effectiveness
of our approach.Comment: Project page: https://niujinshuchong.github.io/mip-splatting
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
3D-aware image synthesis encompasses a variety of tasks, such as scene
generation and novel view synthesis from images. Despite numerous task-specific
methods, developing a comprehensive model remains challenging. In this paper,
we present SSDNeRF, a unified approach that employs an expressive diffusion
model to learn a generalizable prior of neural radiance fields (NeRF) from
multi-view images of diverse objects. Previous studies have used two-stage
approaches that rely on pretrained NeRFs as real data to train diffusion
models. In contrast, we propose a new single-stage training paradigm with an
end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent
diffusion model, enabling simultaneous 3D reconstruction and prior learning,
even from sparsely available views. At test time, we can directly sample the
diffusion prior for unconditional generation, or combine it with arbitrary
observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates
robust results comparable to or better than leading task-specific methods in
unconditional generation and single/sparse-view 3D reconstruction.Comment: Project page: https://lakonik.github.io/ssdner