177 research outputs found
BSD-GAN: Branched Generative Adversarial Network for Scale-Disentangled Representation Learning and Image Synthesis
We introduce BSD-GAN, a novel multi-branch and scale-disentangled training
method which enables unconditional Generative Adversarial Networks (GANs) to
learn image representations at multiple scales, benefiting a wide range of
generation and editing tasks. The key feature of BSD-GAN is that it is trained
in multiple branches, progressively covering both the breadth and depth of the
network, as resolutions of the training images increase to reveal finer-scale
features. Specifically, each noise vector, as input to the generator network of
BSD-GAN, is deliberately split into several sub-vectors, each corresponding to,
and is trained to learn, image representations at a particular scale. During
training, we progressively "de-freeze" the sub-vectors, one at a time, as a new
set of higher-resolution images is employed for training and more network
layers are added. A consequence of such an explicit sub-vector designation is
that we can directly manipulate and even combine latent (sub-vector) codes
which model different feature scales.Extensive experiments demonstrate the
effectiveness of our training method in scale-disentangled learning of image
representations and synthesis of novel image contents, without any extra labels
and without compromising quality of the synthesized high-resolution images. We
further demonstrate several image generation and manipulation applications
enabled or improved by BSD-GAN. Source codes are available at
https://github.com/duxingren14/BSD-GAN.Comment: 12 pages, 20 figures, accepted to IEEE Transaction on Image
Processin
From rule-based to learning-based image-conditional image generation
Visual contents, such as movies, animations, computer games, videos and photos, are
massively produced and consumed nowadays. Most of these contents are the combination
of materials captured from real-world and contents synthesized by computers. Particularly,
computer-generated visual contents are increasingly indispensable in modern entertainment
and production. The generation of visual contents by computers is typically conditioned on
real-world materials, driven by the imagination of designers and artists, or a combination
of both. However, creating visual contents manually are both challenging and labor intensive.
Therefore, enabling computers to automatically or semi-automatically synthesize
needed visual contents becomes essential. Among all these efforts, a stream of research
is to generate novel images based on given image priors, e.g., photos and sketches. This
research direction is known as image-conditional image generation, which covers a wide
range of topics such as image stylization, image completion, image fusion, sketch-to-image
generation, and extracting image label maps. In this thesis, a set of novel approaches for
image-conditional image generation are presented.
The thesis starts with an exemplar-based method for facial image stylization in Chapter
2. This method involves a unified framework for facial image stylization based on a single
style exemplar. A two-phase procedure is employed, where the first phase searches a dense
and semantic-aware correspondence between the input and the exemplar images, and the
second phase conducts edge-preserving texture transfer. While this algorithm has the merit
of requiring only a single exemplar, it is constrained to face photos. To perform generalized
image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a
novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which
enables image translators to be trained from two sets of unlabeled images from two domains.
This is followed by another data-driven method in Chapter 4, which learns multiscale
manifolds from a set of images and then enables synthesizing novel images that mimic
the appearance of the target image dataset. The method is named as Branched Generative
Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned
generative adversarial networks (GANs) to learn image manifolds at multiple
scales. As a result, we can directly manipulate and even combine latent manifold codes
that are associated with specific feature scales. Finally, to provide users more control over
image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD)
that significantly improves the art of manipulating high-resolution images through
utilizing the multi-scale manifold learned with BranchGAN
Recommended from our members
3D Shape Understanding and Generation
In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually all image-based benchmarks. For 3D data, the best combination of model and representation is still an open question. Second, 3D data is not available on the same scale as images – taking pictures is a common procedure in our daily lives, whereas capturing 3D content is an activity usually restricted to specialized professionals. This thesis is focused on addressing both of these issues. Which model and representation should we use for generating and recognizing 3D data? What are efficient ways of learning 3D representations from a few examples? Is it possible to leverage image data to build models capable of reasoning about the world in 3D?
Our research findings show that it is possible to build models that efficiently generate 3D shapes as irregularly structured representations. Those models require significantly less memory while generating higher quality shapes than the ones based on voxels and multi-view representations. We start by developing techniques to generate shapes represented as point clouds. This class of models leads to high quality reconstructions and better unsupervised feature learning. However, since point clouds are not amenable to editing and human manipulation, we also present models capable of generating shapes as sets of shape handles -- simpler primitives that summarize complex 3D shapes and were specifically designed for high-level tasks and user interaction. Despite their effectiveness, those approaches require some form of 3D supervision, which is scarce. We present multiple alternatives to this problem. First, we investigate how approximate convex decomposition techniques can be used as self-supervision to improve recognition models when only a limited number of labels are available. Second, we study how neural network architectures induce shape priors that can be used in multiple reconstruction tasks -- using both volumetric and manifold representations. In this regime, reconstruction is performed from a single example -- either a sparse point cloud or multiple silhouettes. Finally, we demonstrate how to train generative models of 3D shapes without using any 3D supervision by combining differentiable rendering techniques and Generative Adversarial Networks
- …