156 research outputs found
Unsupervised Image Super-Resolution with an Indirect Supervised Path
The task of single image super-resolution (SISR) aims at reconstructing a
high-resolution (HR) image from a low-resolution (LR) image. Although
significant progress has been made by deep learning models, they are trained on
synthetic paired data in a supervised way and do not perform well on real data.
There are several attempts that directly apply unsupervised image translation
models to address such a problem. However, unsupervised low-level vision
problem poses more challenge on the accuracy of translation. In this work,we
propose a novel framework which is composed of two stages: 1) unsupervised
image translation between real LR images and synthetic LR images; 2) supervised
super-resolution from approximated real LR images to HR images. It takes the
synthetic LR images as a bridge and creates an indirect supervised path from
real LR images to HR images. Any existed deep learning based image
super-resolution model can be integrated into the second stage of the proposed
framework for further improvement. In addition it shows great flexibility in
balancing between distortion and perceptual quality under unsupervised setting.
The proposed method is evaluated on both NTIRE 2017 and 2018 challenge datasets
and achieves favorable performance against supervised methods
Unsupervised Image-to-Image Translation Networks
Unsupervised image-to-image translation aims at learning a joint distribution
of images in different domains by using images from the marginal distributions
in individual domains. Since there exists an infinite set of joint
distributions that can arrive the given marginal distributions, one could infer
nothing about the joint distribution from the marginal distributions without
additional assumptions. To address the problem, we make a shared-latent space
assumption and propose an unsupervised image-to-image translation framework
based on Coupled GANs. We compare the proposed framework with competing
approaches and present high quality image translation results on various
challenging unsupervised image translation tasks, including street scene image
translation, animal image translation, and face image translation. We also
apply the proposed framework to domain adaptation and achieve state-of-the-art
performance on benchmark datasets. Code and additional results are available in
https://github.com/mingyuliutw/unit .Comment: NIPS 2017, 11 pages, 6 figure
COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder
Unsupervised image-to-image translation intends to learn a mapping of an
image in a given domain to an analogous image in a different domain, without
explicit supervision of the mapping. Few-shot unsupervised image-to-image
translation further attempts to generalize the model to an unseen domain by
leveraging example images of the unseen domain provided at inference time.
While remarkably successful, existing few-shot image-to-image translation
models find it difficult to preserve the structure of the input image while
emulating the appearance of the unseen domain, which we refer to as the content
loss problem. This is particularly severe when the poses of the objects in the
input and example images are very different. To address the issue, we propose a
new few-shot image translation model, COCO-FUNIT, which computes the style
embedding of the example images conditioned on the input image and a new module
called the constant style bias. Through extensive experimental validations with
comparison to the state-of-the-art, our model shows effectiveness in addressing
the content loss problem. For code and pretrained models, please check out
https://nvlabs.github.io/COCO-FUNIT/ .Comment: The paper will be presented at the EUROPEAN Conference on Computer
Vision (ECCV) 202
Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors
Unconditional image generation has recently been dominated by generative
adversarial networks (GANs). GAN methods train a generator which regresses
images from random noise vectors, as well as a discriminator that attempts to
differentiate between the generated images and a training set of real images.
GANs have shown amazing results at generating realistic looking images. Despite
their success, GANs suffer from critical drawbacks including: unstable training
and mode-dropping. The weaknesses in GANs have motivated research into
alternatives including: variational auto-encoders (VAEs), latent embedding
learning methods (e.g. GLO) and nearest-neighbor based implicit maximum
likelihood estimation (IMLE). Unfortunately at the moment, GANs still
significantly outperform the alternative methods for image generation. In this
work, we present a novel method - Generative Latent Nearest Neighbors (GLANN) -
for training generative models without adversarial training. GLANN combines the
strengths of IMLE and GLO in a way that overcomes the main drawbacks of each
method. Consequently, GLANN generates images that are far better than GLO and
IMLE. Our method does not suffer from mode collapse which plagues GAN training
and is much more stable. Qualitative results show that GLANN outperforms a
baseline consisting of 800 GANs and VAEs on commonly used datasets. Our models
are also shown to be effective for training truly non-adversarial unsupervised
image translation
XOGAN: One-to-Many Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning the relationship
between samples from two image domains without supervised pair information. The
relationship between two domain images can be one-to-one, one-to-many or
many-to-many. In this paper, we study the one-to-many unsupervised image
translation problem in which an input sample from one domain can correspond to
multiple samples in the other domain. To learn the complex relationship between
the two domains, we introduce an additional variable to control the variations
in our one-to-many mapping. A generative model with an XO-structure, called the
XOGAN, is proposed to learn the cross domain relationship among the two domains
and the ad- ditional variables. Not only can we learn to translate between the
two image domains, we can also handle the translated images with additional
variations. Experiments are performed on unpaired image generation tasks,
including edges-to-objects translation and facial image translation. We show
that the proposed XOGAN model can generate plausible images and control
variations, such as color and texture, of the generated images. Moreover, while
state-of-the-art unpaired image generation algorithms tend to generate images
with monotonous colors, XOGAN can generate more diverse results
Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs
We present a framework for translating unlabeled images from one domain into
analog images in another domain. We employ a progressively growing
skip-connected encoder-generator structure and train it with a GAN loss for
realistic output, a cycle consistency loss for maintaining same-domain
translation identity, and a semantic consistency loss that encourages the
network to keep the input semantic features in the output. We apply our
framework on the task of translating face images, and show that it is capable
of learning semantic mappings for face images with no supervised one-to-one
image mapping
Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders
Unsupervised Image-to-Image Translation achieves spectacularly advanced
developments nowadays. However, recent approaches mainly focus on one model
with two domains, which may face heavy burdens with large cost of
training time and model parameters, under such a requirement that domains
are freely transferred to each other in a general setting. To address this
problem, we propose a novel and unified framework named Domain-Bank, which
consists of a global shared auto-encoder and domain-specific
encoders/decoders, assuming that a universal shared-latent sapce can be
projected. Thus, we yield complexity in model parameters along with a
huge reduction of the time budgets. Besides the high efficiency, we show the
comparable (or even better) image translation results over state-of-the-arts on
various challenging unsupervised image translation tasks, including face image
translation, fashion-clothes translation and painting style translation. We
also apply the proposed framework to domain adaptation and achieve
state-of-the-art performance on digit benchmark datasets. Further, thanks to
the explicit representation of the domain-specific decoders as well as the
universal shared-latent space, it also enables us to conduct incremental
learning to add a new domain encoder/decoder. Linear combination of different
domains' representations is also obtained by fusing the corresponding decoders
SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation
For unsupervised image-to-image translation, we propose a discriminator
architecture which focuses on the statistical features instead of individual
patches. The network is stabilized by distribution matching of key statistical
features at multiple scales. Unlike the existing methods which impose more and
more constraints on the generator, our method facilitates the shape deformation
and enhances the fine details with a greatly simplified framework. We show that
the proposed method outperforms the existing state-of-the-art models in various
challenging applications including selfie-to-anime, male-to-female and glasses
removal. The code will be made publicly available
Neural Hair Rendering
In this paper, we propose a generic neural-based hair rendering pipeline that
can synthesize photo-realistic images from virtual 3D hair models. Unlike
existing supervised translation methods that require model-level similarity to
preserve consistent structure representation for both real images and fake
renderings, our method adopts an unsupervised solution to work on arbitrary
hair models. The key component of our method is a shared latent space to encode
appearance-invariant structure information of both domains, which generates
realistic renderings conditioned by extra appearance inputs. This is achieved
by domain-specific pre-disentangled structure representation, partially shared
domain encoder layers and a structure discriminator. We also propose a simple
yet effective temporal conditioning method to enforce consistency for video
sequence generation. We demonstrate the superiority of our method by testing it
on a large number of portraits and comparing it with alternative baselines and
state-of-the-art unsupervised image translation methods.Comment: ECCV 202
The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models
A large number of objectives have been proposed to train latent variable
generative models. We show that many of them are Lagrangian dual functions of
the same primal optimization problem. The primal problem optimizes the mutual
information between latent and visible variables, subject to the constraints of
accurately modeling the data distribution and performing correct amortized
inference. By choosing to maximize or minimize mutual information, and choosing
different Lagrange multipliers, we obtain different objectives including
InfoGAN, ALI/BiGAN, ALICE, CycleGAN, beta-VAE, adversarial autoencoders, AVB,
AS-VAE and InfoVAE. Based on this observation, we provide an exhaustive
characterization of the statistical and computational trade-offs made by all
the training objectives in this class of Lagrangian duals. Next, we propose a
dual optimization method where we optimize model parameters as well as the
Lagrange multipliers. This method achieves Pareto optimal solutions in terms of
optimizing information and satisfying the constraints
- …