67,498 research outputs found
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
We propose SceneTex, a novel method for effectively generating high-quality
and style-consistent textures for indoor scenes using depth-to-image diffusion
priors. Unlike previous methods that either iteratively warp 2D views onto a
mesh surface or distillate diffusion latent features without accurate geometric
and style cues, SceneTex formulates the texture synthesis task as an
optimization problem in the RGB space where style and geometry consistency are
properly reflected. At its core, SceneTex proposes a multiresolution texture
field to implicitly encode the mesh appearance. We optimize the target texture
via a score-distillation-based objective function in respective RGB renderings.
To further secure the style consistency across views, we introduce a
cross-attention decoder to predict the RGB values by cross-attending to the
pre-sampled reference locations in each instance. SceneTex enables various and
accurate texture synthesis for 3D-FRONT scenes, demonstrating significant
improvements in visual quality and prompt fidelity over the prior texture
generation methods.Comment: Project website: https://daveredrum.github.io/SceneTex
TEGLO: High Fidelity Canonical Texture Mapping from Single-View Images
Recent work in Neural Fields (NFs) learn 3D representations from
class-specific single view image collections. However, they are unable to
reconstruct the input data preserving high-frequency details. Further, these
methods do not disentangle appearance from geometry and hence are not suitable
for tasks such as texture transfer and editing. In this work, we propose TEGLO
(Textured EG3D-GLO) for learning 3D representations from single view
in-the-wild image collections for a given class of objects. We accomplish this
by training a conditional Neural Radiance Field (NeRF) without any explicit 3D
supervision. We equip our method with editing capabilities by creating a dense
correspondence mapping to a 2D canonical space. We demonstrate that such
mapping enables texture transfer and texture editing without requiring meshes
with shared topology. Our key insight is that by mapping the input image pixels
onto the texture space we can achieve near perfect reconstruction (>= 74 dB
PSNR at 1024^2 resolution). Our formulation allows for high quality 3D
consistent novel view synthesis with high-frequency details at megapixel image
resolution
Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control
We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results
Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture
This paper addresses the problem of interpolating visual textures. We
formulate this problem by requiring (1) by-example controllability and (2)
realistic and smooth interpolation among an arbitrary number of texture
samples. To solve it we propose a neural network trained simultaneously on a
reconstruction task and a generation task, which can project texture examples
onto a latent space where they can be linearly interpolated and projected back
onto the image domain, thus ensuring both intuitive control and realistic
results. We show our method outperforms a number of baselines according to a
comprehensive suite of metrics as well as a user study. We further show several
applications based on our technique, which include texture brush, texture
dissolve, and animal hybridization.Comment: Accepted to CVPR'1
MoFaNeRF: Morphable Facial Neural Radiance Field
We propose a parametric model that maps free-view images into a vector space
of coded facial shape, expression and appearance with a neural radiance field,
namely Morphable Facial NeRF. Specifically, MoFaNeRF takes the coded facial
shape, expression and appearance along with space coordinate and view direction
as input to an MLP, and outputs the radiance of the space point for
photo-realistic image synthesis. Compared with conventional 3D morphable models
(3DMM), MoFaNeRF shows superiority in directly synthesizing photo-realistic
facial details even for eyes, mouths, and beards. Also, continuous face
morphing can be easily achieved by interpolating the input shape, expression
and appearance codes. By introducing identity-specific modulation and texture
encoder, our model synthesizes accurate photometric details and shows strong
representation ability. Our model shows strong ability on multiple applications
including image-based fitting, random generation, face rigging, face editing,
and novel view synthesis. Experiments show that our method achieves higher
representation ability than previous parametric models, and achieves
competitive performance in several applications. To the best of our knowledge,
our work is the first facial parametric model built upon a neural radiance
field that can be used in fitting, generation and manipulation. The code and
data is available at https://github.com/zhuhao-nju/mofanerf.Comment: accepted to ECCV2022; code available at
http://github.com/zhuhao-nju/mofaner
- …