Visual contents, such as movies, animations, computer games, videos and photos, are
massively produced and consumed nowadays. Most of these contents are the combination
of materials captured from real-world and contents synthesized by computers. Particularly,
computer-generated visual contents are increasingly indispensable in modern entertainment
and production. The generation of visual contents by computers is typically conditioned on
real-world materials, driven by the imagination of designers and artists, or a combination
of both. However, creating visual contents manually are both challenging and labor intensive.
Therefore, enabling computers to automatically or semi-automatically synthesize
needed visual contents becomes essential. Among all these efforts, a stream of research
is to generate novel images based on given image priors, e.g., photos and sketches. This
research direction is known as image-conditional image generation, which covers a wide
range of topics such as image stylization, image completion, image fusion, sketch-to-image
generation, and extracting image label maps. In this thesis, a set of novel approaches for
image-conditional image generation are presented.
The thesis starts with an exemplar-based method for facial image stylization in Chapter
2. This method involves a unified framework for facial image stylization based on a single
style exemplar. A two-phase procedure is employed, where the first phase searches a dense
and semantic-aware correspondence between the input and the exemplar images, and the
second phase conducts edge-preserving texture transfer. While this algorithm has the merit
of requiring only a single exemplar, it is constrained to face photos. To perform generalized
image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a
novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which
enables image translators to be trained from two sets of unlabeled images from two domains.
This is followed by another data-driven method in Chapter 4, which learns multiscale
manifolds from a set of images and then enables synthesizing novel images that mimic
the appearance of the target image dataset. The method is named as Branched Generative
Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned
generative adversarial networks (GANs) to learn image manifolds at multiple
scales. As a result, we can directly manipulate and even combine latent manifold codes
that are associated with specific feature scales. Finally, to provide users more control over
image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD)
that significantly improves the art of manipulating high-resolution images through
utilizing the multi-scale manifold learned with BranchGAN