2,294 research outputs found
Asymmetric Generative Adversarial Networks for Image-to-Image Translation
State-of-the-art models for unpaired image-to-image translation with
Generative Adversarial Networks (GANs) can learn the mapping from the source
domain to the target domain using a cycle-consistency loss. The intuition
behind these models is that if we translate from one domain to the other and
back again we should arrive at where we started. However, existing methods
always adopt a symmetric network architecture to learn both forward and
backward cycles. Because of the task complexity and cycle input difference
between the source and target image domains, the inequality in bidirectional
forward-backward cycle translations is significant and the amount of
information between two domains is different. In this paper, we analyze the
limitation of the existing symmetric GAN models in asymmetric translation
tasks, and propose an AsymmetricGAN model with both translation and
reconstruction generators of unequal sizes and different parameter-sharing
strategy to adapt to the asymmetric need in both unsupervised and supervised
image-to-image translation tasks. Moreover, the training stage of existing
methods has the common problem of model collapse that degrades the quality of
the generated images, thus we explore different optimization losses for better
training of AsymmetricGAN, and thus make image-to-image translation with higher
consistency and better stability. Extensive experiments on both supervised and
unsupervised generative tasks with several publicly available datasets
demonstrate that the proposed AsymmetricGAN achieves superior model capacity
and better generation performance compared with existing GAN models. To the
best of our knowledge, we are the first to investigate the asymmetric GAN
framework on both unsupervised and supervised image-to-image translation tasks.
The source code, data and trained models are available at
https://github.com/Ha0Tang/AsymmetricGAN.Comment: An extended version of a paper published in ACCV2018. arXiv admin
note: substantial text overlap with arXiv:1901.0460
Label-Noise Robust Multi-Domain Image-to-Image Translation
Multi-domain image-to-image translation is a problem where the goal is to
learn mappings among multiple domains. This problem is challenging in terms of
scalability because it requires the learning of numerous mappings, the number
of which increases proportional to the number of domains. However, generative
adversarial networks (GANs) have emerged recently as a powerful framework for
this problem. In particular, label-conditional extensions (e.g., StarGAN) have
become a promising solution owing to their ability to address this problem
using only a single unified model. Nonetheless, a limitation is that they rely
on the availability of large-scale clean-labeled data, which are often
laborious or impractical to collect in a real-world scenario. To overcome this
limitation, we propose a novel model called the label-noise robust
image-to-image translation model (RMIT) that can learn a clean label
conditional generator even when noisy labeled data are only available. In
particular, we propose a novel loss called the virtual cycle consistency loss
that is able to regularize cyclic reconstruction independently of noisy labeled
data, as well as we introduce advanced techniques to boost the performance in
practice. Our experimental results demonstrate that RMIT is useful for
obtaining label-noise robustness in various settings including synthetic and
real-world noise
Cross-Entropy Adversarial View Adaptation for Person Re-identification
Person re-identification (re-ID) is a task of matching pedestrians under
disjoint camera views. To recognise paired snapshots, it has to cope with large
cross-view variations caused by the camera view shift. Supervised deep neural
networks are effective in producing a set of non-linear projections that can
transform cross-view images into a common feature space. However, they
typically impose a symmetric architecture, yielding the network ill-conditioned
on its optimisation. In this paper, we learn view-invariant subspace for person
re-ID, and its corresponding similarity metric using an adversarial view
adaptation approach. The main contribution is to learn coupled asymmetric
mappings regarding view characteristics which are adversarially trained to
address the view discrepancy by optimising the cross-entropy view confusion
objective. To determine the similarity value, the network is empowered with a
similarity discriminator to promote features that are highly discriminant in
distinguishing positive and negative pairs. The other contribution includes an
adaptive weighing on the most difficult samples to address the imbalance of
within/between-identity pairs. Our approach achieves notable improved
performance in comparison to state-of-the-arts on benchmark datasets.Comment: Appearing at IEEE Transactions on Circuits and Systems for Video
Technolog
Label-Noise Robust Generative Adversarial Networks
Generative adversarial networks (GANs) are a framework that learns a
generative distribution through adversarial training. Recently, their
class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary
classifier GAN (AC-GAN)) have attracted much attention owing to their ability
to learn the disentangled representations and to improve the training
stability. However, their training requires the availability of large-scale
accurate class-labeled data, which are often laborious or impractical to
collect in a real-world scenario. To remedy this, we propose a novel family of
GANs called label-noise robust GANs (rGANs), which, by incorporating a noise
transition model, can learn a clean label conditional generative distribution
even when training labels are noisy. In particular, we propose two variants:
rAC-GAN, which is a bridging model between AC-GAN and the label-noise robust
classification model, and rcGAN, which is an extension of cGAN and solves this
problem with no reliance on any classifier. In addition to providing the
theoretical background, we demonstrate the effectiveness of our models through
extensive experiments using diverse GAN configurations, various noise settings,
and multiple evaluation metrics (in which we tested 402 conditions in total).
Our code is available at https://github.com/takuhirok/rGAN/.Comment: Accepted to CVPR 2019 (Oral). Project page:
https://takuhirok.github.io/rGAN
Conditional Image-to-Image Translation
Image-to-image translation tasks have been widely investigated with
Generative Adversarial Networks (GANs) and dual learning. However, existing
models lack the ability to control the translated results in the target domain
and their results usually lack of diversity in the sense that a fixed image
usually leads to (almost) deterministic translation result. In this paper, we
study a new problem, conditional image-to-image translation, which is to
translate an image from the source domain to the target domain conditioned on a
given image in the target domain. It requires that the generated image should
inherit some domain-specific features of the conditional image from the target
domain. Therefore, changing the conditional image in the target domain will
lead to diverse translation results for a fixed input image from the source
domain, and therefore the conditional input image helps to control the
translation results. We tackle this problem with unpaired data based on GANs
and dual learning. We twist two conditional translation models (one translation
from A domain to B domain, and the other one from B domain to A domain)
together for inputs combination and reconstruction while preserving domain
independent features. We carry out experiments on men's faces from-to women's
faces translation and edges to shoes&bags translations. The results demonstrate
the effectiveness of our proposed method.Comment: 9 pages, 9 figures, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
We propose a local adversarial disentangling network (LADN) for facial makeup
and de-makeup. Central to our method are multiple and overlapping local
adversarial discriminators in a content-style disentangling network for
achieving local detail transfer between facial images, with the use of
asymmetric loss functions for dramatic makeup styles with high-frequency
details. Existing techniques do not demonstrate or fail to transfer
high-frequency details in a global adversarial setting, or train a single local
discriminator only to ensure image structure consistency and thus work only for
relatively simple styles. Unlike others, our proposed local adversarial
discriminators can distinguish whether the generated local image details are
consistent with the corresponding regions in the given reference image in
cross-image style transfer in an unsupervised setting. Incorporating these
technical contributions, we achieve not only state-of-the-art results on
conventional styles but also novel results involving complex and dramatic
styles with high-frequency details covering large areas across multiple facial
features. A carefully designed dataset of unpaired before and after makeup
images is released.Comment: Qiao and Guanzhi have equal contribution. Accepted to ICCV 2019.
Project website: https://georgegu1997.github.io/LADN-project-page
Compatibility Family Learning for Item Recommendation and Generation
Compatibility between items, such as clothes and shoes, is a major factor
among customer's purchasing decisions. However, learning "compatibility" is
challenging due to (1) broader notions of compatibility than those of
similarity, (2) the asymmetric nature of compatibility, and (3) only a small
set of compatible and incompatible items are observed. We propose an end-to-end
trainable system to embed each item into a latent vector and project a query
item into K compatible prototypes in the same space. These prototypes reflect
the broad notions of compatibility. We refer to both the embedding and
prototypes as "Compatibility Family". In our learned space, we introduce a
novel Projected Compatibility Distance (PCD) function which is differentiable
and ensures diversity by aiming for at least one prototype to be close to a
compatible item, whereas none of the prototypes are close to an incompatible
item. We evaluate our system on a toy dataset, two Amazon product datasets, and
Polyvore outfit dataset. Our method consistently achieves state-of-the-art
performance. Finally, we show that we can visualize the candidate compatible
prototypes using a Metric-regularized Conditional Generative Adversarial
Network (MrCGAN), where the input is a projected prototype and the output is a
generated image of a compatible item. We ask human evaluators to judge the
relative compatibility between our generated images and images generated by
CGANs conditioned directly on query items. Our generated images are
significantly preferred, with roughly twice the number of votes as others.Comment: 9 pages, accepted to AAAI 201
Mask-Guided Portrait Editing with Conditional GANs
Portrait editing is a popular subject in photo manipulation. The Generative
Adversarial Network (GAN) advances the generating of realistic faces and allows
more face editing. In this paper, we argue about three issues in existing
techniques: diversity, quality, and controllability for portrait synthesis and
editing. To address these issues, we propose a novel end-to-end learning
framework that leverages conditional GANs guided by provided face masks for
generating faces. The framework learns feature embeddings for every face
component (e.g., mouth, hair, eye), separately, contributing to better
correspondences for image translation, and local face editing. With the mask,
our network is available to many applications, like face synthesis driven by
mask, face Swap+ (including hair in swapping), and local manipulation. It can
also boost the performance of face parsing a bit as an option of data
augmentation.Comment: To appear in CVPR201
Similarity-preserving Image-image Domain Adaptation for Person Re-identification
This article studies the domain adaptation problem in person
re-identification (re-ID) under a "learning via translation" framework,
consisting of two components, 1) translating the labeled images from the source
to the target domain in an unsupervised manner, 2) learning a re-ID model using
the translated images. The objective is to preserve the underlying human
identity information after image translation, so that translated images with
labels are effective for feature learning on the target domain. To this end, we
propose a similarity preserving generative adversarial network (SPGAN) and its
end-to-end trainable version, eSPGAN. Both aiming at similarity preserving,
SPGAN enforces this property by heuristic constraints, while eSPGAN does so by
optimally facilitating the re-ID model learning. More specifically, SPGAN
separately undertakes the two components in the "learning via translation"
framework. It first preserves two types of unsupervised similarity, namely,
self-similarity of an image before and after translation, and
domain-dissimilarity of a translated source image and a target image. It then
learns a re-ID model using existing networks. In comparison, eSPGAN seamlessly
integrates image translation and re-ID model learning. During the end-to-end
training of eSPGAN, re-ID learning guides image translation to preserve the
underlying identity information of an image. Meanwhile, image translation
improves re-ID learning by providing identity-preserving training samples of
the target domain style. In the experiment, we show that identities of the fake
images generated by SPGAN and eSPGAN are well preserved. Based on this, we
report the new state-of-the-art domain adaptation results on two large-scale
person re-ID datasets.Comment: 14 pages, 7 tables, 14 figures, this version is not fully edited and
will be updated soon. arXiv admin note: text overlap with arXiv:1711.0702
Expression Conditional GAN for Facial Expression-to-Expression Translation
In this paper, we focus on the facial expression translation task and propose
a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one
image domain to another one based on an additional expression attribute. The
proposed ECGAN is a generic framework and is applicable to different expression
generation tasks where specific facial expression can be easily controlled by
the conditional attribute label. Besides, we introduce a novel face mask loss
to reduce the influence of background changing. Moreover, we propose an entire
framework for facial expression generation and recognition in the wild, which
consists of two modules, i.e., generation and recognition. Finally, we evaluate
our framework on several public face datasets in which the subjects have
different races, illumination, occlusion, pose, color, content and background
conditions. Even though these datasets are very diverse, both the qualitative
and quantitative results demonstrate that our approach is able to generate
facial expressions accurately and robustly.Comment: 5 pages, 5 figures, accepted to ICIP 201
- …