29,997 research outputs found
From rule-based to learning-based image-conditional image generation
Visual contents, such as movies, animations, computer games, videos and photos, are
massively produced and consumed nowadays. Most of these contents are the combination
of materials captured from real-world and contents synthesized by computers. Particularly,
computer-generated visual contents are increasingly indispensable in modern entertainment
and production. The generation of visual contents by computers is typically conditioned on
real-world materials, driven by the imagination of designers and artists, or a combination
of both. However, creating visual contents manually are both challenging and labor intensive.
Therefore, enabling computers to automatically or semi-automatically synthesize
needed visual contents becomes essential. Among all these efforts, a stream of research
is to generate novel images based on given image priors, e.g., photos and sketches. This
research direction is known as image-conditional image generation, which covers a wide
range of topics such as image stylization, image completion, image fusion, sketch-to-image
generation, and extracting image label maps. In this thesis, a set of novel approaches for
image-conditional image generation are presented.
The thesis starts with an exemplar-based method for facial image stylization in Chapter
2. This method involves a unified framework for facial image stylization based on a single
style exemplar. A two-phase procedure is employed, where the first phase searches a dense
and semantic-aware correspondence between the input and the exemplar images, and the
second phase conducts edge-preserving texture transfer. While this algorithm has the merit
of requiring only a single exemplar, it is constrained to face photos. To perform generalized
image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a
novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which
enables image translators to be trained from two sets of unlabeled images from two domains.
This is followed by another data-driven method in Chapter 4, which learns multiscale
manifolds from a set of images and then enables synthesizing novel images that mimic
the appearance of the target image dataset. The method is named as Branched Generative
Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned
generative adversarial networks (GANs) to learn image manifolds at multiple
scales. As a result, we can directly manipulate and even combine latent manifold codes
that are associated with specific feature scales. Finally, to provide users more control over
image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD)
that significantly improves the art of manipulating high-resolution images through
utilizing the multi-scale manifold learned with BranchGAN
FaceShop: Deep Sketch-based Face Image Editing
We present a novel system for sketch-based face image editing, enabling users
to edit images intuitively by sketching a few strokes on a region of interest.
Our interface features tools to express a desired image manipulation by
providing both geometry and color constraints as user-drawn strokes. As an
alternative to the direct user input, our proposed system naturally supports a
copy-paste mode, which allows users to edit a given image region by using parts
of another exemplar image without the need of hand-drawn sketching at all. The
proposed interface runs in real-time and facilitates an interactive and
iterative workflow to quickly express the intended edits. Our system is based
on a novel sketch domain and a convolutional neural network trained end-to-end
to automatically learn to render image regions corresponding to the input
strokes. To achieve high quality and semantically consistent results we train
our neural network on two simultaneous tasks, namely image completion and image
translation. To the best of our knowledge, we are the first to combine these
two tasks in a unified framework for interactive image editing. Our results
show that the proposed sketch domain, network architecture, and training
procedure generalize well to real user input and enable high quality synthesis
results without additional post-processing.Comment: 13 pages, 20 figure
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
Contextual-based Image Inpainting: Infer, Match, and Translate
We study the task of image inpainting, which is to fill in the missing region
of an incomplete image with plausible contents. To this end, we propose a
learning-based approach to generate visually coherent completion given a
high-resolution image with missing components. In order to overcome the
difficulty to directly learn the distribution of high-dimensional image data,
we divide the task into inference and translation as two separate steps and
model each step with a deep neural network. We also use simple heuristics to
guide the propagation of local textures from the boundary to the hole. We show
that, by using such techniques, inpainting reduces to the problem of learning
two image-feature translation functions in much smaller space and hence easier
to train. We evaluate our method on several public datasets and show that we
generate results of better visual quality than previous state-of-the-art
methods.Comment: ECCV 2018 camera read
- …