1,496 research outputs found
Asymmetric GAN for Unpaired Image-to-image Translation
Unpaired image-to-image translation problem aims to model the mapping from
one domain to another with unpaired training data. Current works like the
well-acknowledged Cycle GAN provide a general solution for any two domains
through modeling injective mappings with a symmetric structure. While in
situations where two domains are asymmetric in complexity, i.e., the amount of
information between two domains is different, these approaches pose problems of
poor generation quality, mapping ambiguity, and model sensitivity. To address
these issues, we propose Asymmetric GAN (AsymGAN) to adapt the asymmetric
domains by introducing an auxiliary variable (aux) to learn the extra
information for transferring from the information-poor domain to the
information-rich domain, which improves the performance of state-of-the-art
approaches in the following ways. First, aux better balances the information
between two domains which benefits the quality of generation. Second, the
imbalance of information commonly leads to mapping ambiguity, where we are able
to model one-to-many mappings by tuning aux, and furthermore, our aux is
controllable. Third, the training of Cycle GAN can easily make the generator
pair sensitive to small disturbances and variations while our model decouples
the ill-conditioned relevance of generators by injecting aux during training.
We verify the effectiveness of our proposed method both qualitatively and
quantitatively on asymmetric situation, label-photo task, on Cityscapes and
Helen datasets, and show many applications of asymmetric image translations. In
conclusion, our AsymGAN provides a better solution for unpaired image-to-image
translation in asymmetric domains.Comment: Accepted by IEEE Transactions on Image Processing (TIP) 201
Asymmetric Generative Adversarial Networks for Image-to-Image Translation
State-of-the-art models for unpaired image-to-image translation with
Generative Adversarial Networks (GANs) can learn the mapping from the source
domain to the target domain using a cycle-consistency loss. The intuition
behind these models is that if we translate from one domain to the other and
back again we should arrive at where we started. However, existing methods
always adopt a symmetric network architecture to learn both forward and
backward cycles. Because of the task complexity and cycle input difference
between the source and target image domains, the inequality in bidirectional
forward-backward cycle translations is significant and the amount of
information between two domains is different. In this paper, we analyze the
limitation of the existing symmetric GAN models in asymmetric translation
tasks, and propose an AsymmetricGAN model with both translation and
reconstruction generators of unequal sizes and different parameter-sharing
strategy to adapt to the asymmetric need in both unsupervised and supervised
image-to-image translation tasks. Moreover, the training stage of existing
methods has the common problem of model collapse that degrades the quality of
the generated images, thus we explore different optimization losses for better
training of AsymmetricGAN, and thus make image-to-image translation with higher
consistency and better stability. Extensive experiments on both supervised and
unsupervised generative tasks with several publicly available datasets
demonstrate that the proposed AsymmetricGAN achieves superior model capacity
and better generation performance compared with existing GAN models. To the
best of our knowledge, we are the first to investigate the asymmetric GAN
framework on both unsupervised and supervised image-to-image translation tasks.
The source code, data and trained models are available at
https://github.com/Ha0Tang/AsymmetricGAN.Comment: An extended version of a paper published in ACCV2018. arXiv admin
note: substantial text overlap with arXiv:1901.0460
Conditional Image-to-Image Translation
Image-to-image translation tasks have been widely investigated with
Generative Adversarial Networks (GANs) and dual learning. However, existing
models lack the ability to control the translated results in the target domain
and their results usually lack of diversity in the sense that a fixed image
usually leads to (almost) deterministic translation result. In this paper, we
study a new problem, conditional image-to-image translation, which is to
translate an image from the source domain to the target domain conditioned on a
given image in the target domain. It requires that the generated image should
inherit some domain-specific features of the conditional image from the target
domain. Therefore, changing the conditional image in the target domain will
lead to diverse translation results for a fixed input image from the source
domain, and therefore the conditional input image helps to control the
translation results. We tackle this problem with unpaired data based on GANs
and dual learning. We twist two conditional translation models (one translation
from A domain to B domain, and the other one from B domain to A domain)
together for inputs combination and reconstruction while preserving domain
independent features. We carry out experiments on men's faces from-to women's
faces translation and edges to shoes&bags translations. The results demonstrate
the effectiveness of our proposed method.Comment: 9 pages, 9 figures, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR
An Asymmetric Cycle-Consistency Loss for Dealing with Many-to-One Mappings in Image Translation: A Study on Thigh MR Scans
Generative adversarial networks using a cycle-consistency loss facilitate
unpaired training of image-translation models and thereby exhibit a very high
potential in manifold medical applications. However, the fact that images in
one domain potentially map to more than one image in another domain (e.g. in
case of pathological changes) exhibits a major challenge for training the
networks. In this work, we offer a solution to improve the training process in
case of many-to-one mappings by modifying the cycle-consistency loss. We show
formally and empirically that the proposed method improves the performance
significantly without radically changing the architecture and without
increasing the overall complexity. We evaluate our method on thigh MRI scans
with the final goal of segmenting the muscle in fat-infiltrated patients' data.Comment: Presented at IEEE ISBI'2
Implicit Pairs for Boosting Unpaired Image-to-Image Translation
In image-to-image translation the goal is to learn a mapping from one image
domain to another. In the case of supervised approaches the mapping is learned
from paired samples. However, collecting large sets of image pairs is often
either prohibitively expensive or not possible. As a result, in recent years
more attention has been given to techniques that learn the mapping from
unpaired sets.
In our work, we show that injecting implicit pairs into unpaired sets
strengthens the mapping between the two domains, improves the compatibility of
their distributions, and leads to performance boosting of unsupervised
techniques by over 14% across several measurements.
The competence of the implicit pairs is further displayed with the use of
pseudo-pairs, i.e., paired samples which only approximate a real pair. We
demonstrate the effect of the approximated implicit samples on image-to-image
translation problems, where such pseudo-pairs may be synthesized in one
direction, but not in the other. We further show that pseudo-pairs are
significantly more effective as implicit pairs in an unpaired setting, than
directly using them explicitly in a paired setting
Expression Conditional GAN for Facial Expression-to-Expression Translation
In this paper, we focus on the facial expression translation task and propose
a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one
image domain to another one based on an additional expression attribute. The
proposed ECGAN is a generic framework and is applicable to different expression
generation tasks where specific facial expression can be easily controlled by
the conditional attribute label. Besides, we introduce a novel face mask loss
to reduce the influence of background changing. Moreover, we propose an entire
framework for facial expression generation and recognition in the wild, which
consists of two modules, i.e., generation and recognition. Finally, we evaluate
our framework on several public face datasets in which the subjects have
different races, illumination, occlusion, pose, color, content and background
conditions. Even though these datasets are very diverse, both the qualitative
and quantitative results demonstrate that our approach is able to generate
facial expressions accurately and robustly.Comment: 5 pages, 5 figures, accepted to ICIP 201
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
We propose a local adversarial disentangling network (LADN) for facial makeup
and de-makeup. Central to our method are multiple and overlapping local
adversarial discriminators in a content-style disentangling network for
achieving local detail transfer between facial images, with the use of
asymmetric loss functions for dramatic makeup styles with high-frequency
details. Existing techniques do not demonstrate or fail to transfer
high-frequency details in a global adversarial setting, or train a single local
discriminator only to ensure image structure consistency and thus work only for
relatively simple styles. Unlike others, our proposed local adversarial
discriminators can distinguish whether the generated local image details are
consistent with the corresponding regions in the given reference image in
cross-image style transfer in an unsupervised setting. Incorporating these
technical contributions, we achieve not only state-of-the-art results on
conventional styles but also novel results involving complex and dramatic
styles with high-frequency details covering large areas across multiple facial
features. A carefully designed dataset of unpaired before and after makeup
images is released.Comment: Qiao and Guanzhi have equal contribution. Accepted to ICCV 2019.
Project website: https://georgegu1997.github.io/LADN-project-page
TraVeLGAN: Image-to-image Translation by Transformation Vector Learning
Interest in image-to-image translation has grown substantially in recent
years with the success of unsupervised models based on the cycle-consistency
assumption. The achievements of these models have been limited to a particular
subset of domains where this assumption yields good results, namely homogeneous
domains that are characterized by style or texture differences. We tackle the
challenging problem of image-to-image translation where the domains are defined
by high-level shapes and contexts, as well as including significant clutter and
heterogeneity. For this purpose, we introduce a novel GAN based on preserving
intra-domain vector transformations in a latent space learned by a siamese
network. The traditional GAN system introduced a discriminator network to guide
the generator into generating images in the target domain. To this two-network
system we add a third: a siamese network that guides the generator so that each
original image shares semantics with its generated version. With this new
three-network system, we no longer need to constrain the generators with the
ubiquitous cycle-consistency restraint. As a result, the generators can learn
mappings between more complex domains that differ from each other by large
differences - not just style or texture
Label-Noise Robust Multi-Domain Image-to-Image Translation
Multi-domain image-to-image translation is a problem where the goal is to
learn mappings among multiple domains. This problem is challenging in terms of
scalability because it requires the learning of numerous mappings, the number
of which increases proportional to the number of domains. However, generative
adversarial networks (GANs) have emerged recently as a powerful framework for
this problem. In particular, label-conditional extensions (e.g., StarGAN) have
become a promising solution owing to their ability to address this problem
using only a single unified model. Nonetheless, a limitation is that they rely
on the availability of large-scale clean-labeled data, which are often
laborious or impractical to collect in a real-world scenario. To overcome this
limitation, we propose a novel model called the label-noise robust
image-to-image translation model (RMIT) that can learn a clean label
conditional generator even when noisy labeled data are only available. In
particular, we propose a novel loss called the virtual cycle consistency loss
that is able to regularize cyclic reconstruction independently of noisy labeled
data, as well as we introduce advanced techniques to boost the performance in
practice. Our experimental results demonstrate that RMIT is useful for
obtaining label-noise robustness in various settings including synthetic and
real-world noise
Mask-Guided Portrait Editing with Conditional GANs
Portrait editing is a popular subject in photo manipulation. The Generative
Adversarial Network (GAN) advances the generating of realistic faces and allows
more face editing. In this paper, we argue about three issues in existing
techniques: diversity, quality, and controllability for portrait synthesis and
editing. To address these issues, we propose a novel end-to-end learning
framework that leverages conditional GANs guided by provided face masks for
generating faces. The framework learns feature embeddings for every face
component (e.g., mouth, hair, eye), separately, contributing to better
correspondences for image translation, and local face editing. With the mask,
our network is available to many applications, like face synthesis driven by
mask, face Swap+ (including hair in swapping), and local manipulation. It can
also boost the performance of face parsing a bit as an option of data
augmentation.Comment: To appear in CVPR201
- …