24,913 research outputs found
Toward Learning a Unified Many-to-Many Mapping for Diverse Image Translation
Image-to-image translation, which translates input images to a different
domain with a learned one-to-one mapping, has achieved impressive success in
recent years. The success of translation mainly relies on the network
architecture to reserve the structural information while modify the appearance
slightly at the pixel level through adversarial training. Although these
networks are able to learn the mapping, the translated images are predictable
without exclusion. It is more desirable to diversify them using image-to-image
translation by introducing uncertainties, i.e., the generated images hold
potential for variations in colors and textures in addition to the general
similarity to the input images, and this happens in both the target and source
domains. To this end, we propose a novel generative adversarial network (GAN)
based model, InjectionGAN, to learn a many-to-many mapping. In this model, the
input image is combined with latent variables, which comprise of
domain-specific attribute and unspecific random variations. The domain-specific
attribute indicates the target domain of the translation, while the unspecific
random variations introduce uncertainty into the model. A unified framework is
proposed to regroup these two parts and obtain diverse generations in each
domain. Extensive experiments demonstrate that the diverse generations have
high quality for the challenging image-to-image translation tasks where no
pairing information of the training dataset exits. Both quantitative and
qualitative results prove the superior performance of InjectionGAN over the
state-of-the-art approaches
Diverse Image-to-Image Translation via Disentangled Representations
Image-to-image translation aims to learn the mapping between two visual
domains. There are two main challenges for many applications: 1) the lack of
aligned training pairs and 2) multiple possible outputs from a single input
image. In this work, we present an approach based on disentangled
representation for producing diverse outputs without paired training images. To
achieve diversity, we propose to embed images onto two spaces: a
domain-invariant content space capturing shared information across domains and
a domain-specific attribute space. Our model takes the encoded content features
extracted from a given input and the attribute vectors sampled from the
attribute space to produce diverse outputs at test time. To handle unpaired
training data, we introduce a novel cross-cycle consistency loss based on
disentangled representations. Qualitative results show that our model can
generate diverse and realistic images on a wide range of tasks without paired
training data. For quantitative comparisons, we measure realism with user study
and diversity with a perceptual distance metric. We apply the proposed model to
domain adaptation and show competitive performance when compared to the
state-of-the-art on the MNIST-M and the LineMod datasets.Comment: ECCV 2018 (Oral). Project page: http://vllab.ucmerced.edu/hylee/DRIT/
Code: https://github.com/HsinYingLee/DRIT
Network-to-Network Translation with Conditional Invertible Neural Networks
Given the ever-increasing computational costs of modern machine learning
models, we need to find new ways to reuse such expert models and thus tap into
the resources that have been invested in their creation. Recent work suggests
that the power of these massive models is captured by the representations they
learn. Therefore, we seek a model that can relate between different existing
representations and propose to solve this task with a conditionally invertible
network. This network demonstrates its capability by (i) providing generic
transfer between diverse domains, (ii) enabling controlled content synthesis by
allowing modification in other domains, and (iii) facilitating diagnosis of
existing representations by translating them into interpretable domains such as
images. Our domain transfer network can translate between fixed representations
without having to learn or finetune them. This allows users to utilize various
existing domain-specific expert models from the literature that had been
trained with extensive computational resources. Experiments on diverse
conditional image synthesis tasks, competitive image modification results and
experiments on image-to-image and text-to-image generation demonstrate the
generic applicability of our approach. For example, we translate between BERT
and BigGAN, state-of-the-art text and image models to provide text-to-image
generation, which neither of both experts can perform on their own.Comment: NeurIPS 2020 (oral). Code at https://github.com/CompVis/net2ne
A Novel BiLevel Paradigm for Image-to-Image Translation
Image-to-image (I2I) translation is a pixel-level mapping that requires a
large number of paired training data and often suffers from the problems of
high diversity and strong category bias in image scenes. In order to tackle
these problems, we propose a novel BiLevel (BiL) learning paradigm that
alternates the learning of two models, respectively at an instance-specific
(IS) and a general-purpose (GP) level. In each scene, the IS model learns to
maintain the specific scene attributes. It is initialized by the GP model that
learns from all the scenes to obtain the generalizable translation knowledge.
This GP initialization gives the IS model an efficient starting point, thus
enabling its fast adaptation to the new scene with scarce training data. We
conduct extensive I2I translation experiments on human face and street view
datasets. Quantitative results validate that our approach can significantly
boost the performance of classical I2I translation models, such as PG2 and
Pix2Pix. Our visualization results show both higher image quality and more
appropriate instance-specific details, e.g., the translated image of a person
looks more like that person in terms of identity
Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning
Unpaired Image-to-Image Translation (UIT) focuses on translating images among
different domains by using unpaired data, which has received increasing
research focus due to its practical usage. However, existing UIT schemes defect
in the need of supervised training, as well as the lack of encoding domain
information. In this paper, we propose an Attribute Guided UIT model termed
AGUIT to tackle these two challenges. AGUIT considers multi-modal and
multi-domain tasks of UIT jointly with a novel semi-supervised setting, which
also merits in representation disentanglement and fine control of outputs.
Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised
learning process by translating attributes of labeled data to unlabeled data,
and then reconstructing the unlabeled data by a cycle consistency operation.
(2) It decomposes image representation into domain-invariant content code and
domain-specific style code. The redesigned style code embeds image style into
two variables drawn from standard Gaussian distribution and the distribution of
domain label, which facilitates the fine control of translation due to the
continuity of both variables. Finally, we introduce a new challenge, i.e.,
disentangled transfer, for UIT models, which adopts the disentangled
representation to translate data less related with the training set. Extensive
experiments demonstrate the capacity of AGUIT over existing state-of-the-art
models
One-Shot Unsupervised Cross Domain Translation
Given a single image x from domain A and a set of images from domain B, our
task is to generate the analogous of x in B. We argue that this task could be a
key AI capability that underlines the ability of cognitive agents to act in the
world and present empirical evidence that the existing unsupervised domain
translation methods fail on this task. Our method follows a two step process.
First, a variational autoencoder for domain B is trained. Then, given the new
sample x, we create a variational autoencoder for domain A by adapting the
layers that are close to the image in order to directly fit x, and only
indirectly adapt the other layers. Our experiments indicate that the new method
does as well, when trained on one sample x, as the existing domain transfer
methods, when these enjoy a multitude of training samples from domain A. Our
code is made publicly available at
https://github.com/sagiebenaim/OneShotTranslationComment: Published at NIPS 201
Multi-mapping Image-to-Image Translation via Learning Disentanglement
Recent advances of image-to-image translation focus on learning the
one-to-many mapping from two aspects: multi-modal translation and multi-domain
translation. However, the existing methods only consider one of the two
perspectives, which makes them unable to solve each other's problem. To address
this issue, we propose a novel unified model, which bridges these two
objectives. First, we disentangle the input images into the latent
representations by an encoder-decoder architecture with a conditional
adversarial training in the feature space. Then, we encourage the generator to
learn multi-mappings by a random cross-domain translation. As a result, we can
manipulate different parts of the latent representations to perform multi-modal
and multi-domain translations simultaneously. Experiments demonstrate that our
method outperforms state-of-the-art methods.Comment: Accepted by NeurIPS 2019. Code will be available at
https://github.com/Xiaoming-Yu/DMI
Mix and match networks: cross-modal alignment for zero-pair image-to-image translation
This paper addresses the problem of inferring unseen cross-modal
image-to-image translations between multiple modalities. We assume that only
some of the pairwise translations have been seen (i.e. trained) and infer the
remaining unseen translations (where training pairs are not available). We
propose mix and match networks, an approach where multiple encoders and
decoders are aligned in such a way that the desired translation can be obtained
by simply cascading the source encoder and the target decoder, even when they
have not interacted during the training stage (i.e. unseen). The main challenge
lies in the alignment of the latent representations at the bottlenecks of
encoder-decoder pairs. We propose an architecture with several tools to
encourage alignment, including autoencoders and robust side information and
latent consistency losses. We show the benefits of our approach in terms of
effectiveness and scalability compared with other pairwise image-to-image
translation approaches. We also propose zero-pair cross-modal image
translation, a challenging setting where the objective is inferring semantic
segmentation from depth (and vice-versa) without explicit segmentation-depth
pairs, and only from two (disjoint) segmentation-RGB and depth-RGB training
sets. We observe that a certain part of the shared information between unseen
modalities might not be reachable, so we further propose a variant that
leverages pseudo-pairs which allows us to exploit this shared information
between the unseen modalities.Comment: Accepted by IJC
PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain
We propose a universal image reconstruction method to represent detailed
images purely from binary sparse edge and flat color domain. Inspired by the
procedures of painting, our framework, based on generative adversarial network,
consists of three phases: Imitation Phase aims at initializing networks,
followed by Generating Phase to reconstruct preliminary images. Moreover,
Refinement Phase is utilized to fine-tune preliminary images into final outputs
with details. This framework allows our model generating abundant high
frequency details from sparse input information. We also explore the defects of
disentangling style latent space implicitly from images, and demonstrate that
explicit color domain in our model performs better on controllability and
interpretability. In our experiments, we achieve outstanding results on
reconstructing realistic images and translating hand drawn drafts into
satisfactory paintings. Besides, within the domain of edge-to-image
translation, our model PI-REC outperforms existing state-of-the-art methods on
evaluations of realism and accuracy, both quantitatively and qualitatively.Comment: 15 pages, 13 figure
Multi-Mapping Image-to-Image Translation with Central Biasing Normalization
Recent advances in image-to-image translation have seen a rise in approaches
generating diverse images through a single network. To indicate the target
domain for a one-to-many mapping, the latent code is injected into the
generator network. However, we found that the injection method leads to mode
collapse because of normalization strategies. Existing normalization strategies
might either cause the inconsistency of feature distribution or eliminate the
effect of the latent code. To solve these problems, we propose the consistency
within diversity criteria for designing the multi-mapping model. Based on the
criteria, we propose central biasing normalization to inject the latent code
information. Experiments show that our method can improve the quality and
diversity of existing image-to-image translation models, such as StarGAN,
BicycleGAN, and pix2pix
- …