116 research outputs found
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
A Generative Model of People in Clothing
We present the first image-based generative model of people in clothing for
the full body. We sidestep the commonly used complex graphics rendering
pipeline and the need for high-quality 3D scans of dressed people. Instead, we
learn generative models from a large image database. The main challenge is to
cope with the high variance in human pose, shape and appearance. For this
reason, pure image-based approaches have not been considered so far. We show
that this challenge can be overcome by splitting the generating process in two
parts. First, we learn to generate a semantic segmentation of the body and
clothing. Second, we learn a conditional model on the resulting segments that
creates realistic images. The full model is differentiable and can be
conditioned on pose, shape or color. The result are samples of people in
different clothing items and styles. The proposed model can generate entirely
new people with realistic clothing. In several experiments we present
encouraging results that suggest an entirely data-driven approach to people
generation is possible
Leveraging 2D data to learn textured 3D mesh generation
Numerous methods have been proposed for probabilistic generative modelling of
3D objects. However, none of these is able to produce textured objects, which
renders them of limited use for practical tasks. In this work, we present the
first generative model of textured 3D meshes. Training such a model would
traditionally require a large dataset of textured meshes, but unfortunately,
existing datasets of meshes lack detailed textures. We instead propose a new
training methodology that allows learning from collections of 2D images without
any 3D information. To do so, we train our model to explain a distribution of
images by modelling each image as a 3D foreground object placed in front of a
2D background. Thus, it learns to generate meshes that when rendered, produce
images similar to those in its training set.
A well-known problem when generating meshes with deep networks is the
emergence of self-intersections, which are problematic for many use-cases. As a
second contribution we therefore introduce a new generation process for 3D
meshes that guarantees no self-intersections arise, based on the physical
intuition that faces should push one another out of the way as they move.
We conduct extensive experiments on our approach, reporting quantitative and
qualitative results on both synthetic data and natural images. These show our
method successfully learns to generate plausible and diverse textured 3D
samples for five challenging object classes
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
We propose a novel generative adversarial network (GAN) for the task of
unsupervised learning of 3D representations from natural images. Most
generative models rely on 2D kernels to generate images and make few
assumptions about the 3D world. These models therefore tend to create blurry
images or artefacts in tasks that require a strong 3D understanding, such as
novel-view synthesis. HoloGAN instead learns a 3D representation of the world,
and to render this representation in a realistic manner. Unlike other GANs,
HoloGAN provides explicit control over the pose of generated objects through
rigid-body transformations of the learnt 3D features. Our experiments show that
using explicit 3D features enables HoloGAN to disentangle 3D pose and identity,
which is further decomposed into shape and appearance, while still being able
to generate images with similar or higher visual quality than other generative
models. HoloGAN can be trained end-to-end from unlabelled 2D images only.
Particularly, we do not require pose labels, 3D shapes, or multiple views of
the same objects. This shows that HoloGAN is the first generative model that
learns 3D representations from natural images in an entirely unsupervised
manner.Comment: International Conference on Computer Vision ICCV 2019. For project
page, see
https://www.monkeyoverflow.com/#/hologan-unsupervised-learning-of-3d-representations-from-natural-images
Learning to Transfer Texture from Clothing Images to 3D Humans
In this paper, we present a simple yet effective method to automatically
transfer textures of clothing images (front and back) to 3D garments worn on
top SMPL, in real time. We first automatically compute training pairs of images
with aligned 3D garments using a custom non-rigid 3D to 2D registration method,
which is accurate but slow. Using these pairs, we learn a mapping from pixels
to the 3D garment surface. Our idea is to learn dense correspondences from
garment image silhouettes to a 2D-UV map of a 3D garment surface using shape
information alone, completely ignoring texture, which allows us to generalize
to the wide range of web images. Several experiments demonstrate that our model
is more accurate than widely used baselines such as thin-plate-spline warping
and image-to-image translation networks while being orders of magnitude faster.
Our model opens the door for applications such as virtual try-on, and allows
for generation of 3D humans with varied textures which is necessary for
learning.Comment: IEEE Conference on Computer Vision and Pattern Recognitio
- …