3,961 research outputs found
Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound
Unsupervised image-to-image translation is a class of computer vision
problems which aims at modeling conditional distribution of images in the
target domain, given a set of unpaired images in the source and target domains.
An image in the source domain might have multiple representations in the target
domain. Therefore, ambiguity in modeling of the conditional distribution
arises, specially when the images in the source and target domains come from
different modalities. Current approaches mostly rely on simplifying assumptions
to map both domains into a shared-latent space. Consequently, they are only
able to model the domain-invariant information between the two modalities.
These approaches usually fail to model domain-specific information which has no
representation in the target domain. In this work, we propose an unsupervised
image-to-image translation framework which maximizes a domain-specific
variational information bound and learns the target domain-invariant
representation of the two domain. The proposed framework makes it possible to
map a single source image into multiple images in the target domain, utilizing
several target domain-specific codes sampled randomly from the prior
distribution, or extracted from reference images.Comment: NIPS 201
Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders
Unsupervised Image-to-Image Translation achieves spectacularly advanced
developments nowadays. However, recent approaches mainly focus on one model
with two domains, which may face heavy burdens with large cost of
training time and model parameters, under such a requirement that domains
are freely transferred to each other in a general setting. To address this
problem, we propose a novel and unified framework named Domain-Bank, which
consists of a global shared auto-encoder and domain-specific
encoders/decoders, assuming that a universal shared-latent sapce can be
projected. Thus, we yield complexity in model parameters along with a
huge reduction of the time budgets. Besides the high efficiency, we show the
comparable (or even better) image translation results over state-of-the-arts on
various challenging unsupervised image translation tasks, including face image
translation, fashion-clothes translation and painting style translation. We
also apply the proposed framework to domain adaptation and achieve
state-of-the-art performance on digit benchmark datasets. Further, thanks to
the explicit representation of the domain-specific decoders as well as the
universal shared-latent space, it also enables us to conduct incremental
learning to add a new domain encoder/decoder. Linear combination of different
domains' representations is also obtained by fusing the corresponding decoders
Toward Learning a Unified Many-to-Many Mapping for Diverse Image Translation
Image-to-image translation, which translates input images to a different
domain with a learned one-to-one mapping, has achieved impressive success in
recent years. The success of translation mainly relies on the network
architecture to reserve the structural information while modify the appearance
slightly at the pixel level through adversarial training. Although these
networks are able to learn the mapping, the translated images are predictable
without exclusion. It is more desirable to diversify them using image-to-image
translation by introducing uncertainties, i.e., the generated images hold
potential for variations in colors and textures in addition to the general
similarity to the input images, and this happens in both the target and source
domains. To this end, we propose a novel generative adversarial network (GAN)
based model, InjectionGAN, to learn a many-to-many mapping. In this model, the
input image is combined with latent variables, which comprise of
domain-specific attribute and unspecific random variations. The domain-specific
attribute indicates the target domain of the translation, while the unspecific
random variations introduce uncertainty into the model. A unified framework is
proposed to regroup these two parts and obtain diverse generations in each
domain. Extensive experiments demonstrate that the diverse generations have
high quality for the challenging image-to-image translation tasks where no
pairing information of the training dataset exits. Both quantitative and
qualitative results prove the superior performance of InjectionGAN over the
state-of-the-art approaches
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach
Unsupervised domain adaptation (uDA) models focus on pairwise adaptation
settings where there is a single, labeled, source and a single target domain.
However, in many real-world settings one seeks to adapt to multiple, but
somewhat similar, target domains. Applying pairwise adaptation approaches to
this setting may be suboptimal, as they fail to leverage shared information
among multiple domains. In this work we propose an information theoretic
approach for domain adaptation in the novel context of multiple target domains
with unlabeled instances and one source domain with labeled instances. Our
model aims to find a shared latent space common to all domains, while
simultaneously accounting for the remaining private, domain-specific factors.
Disentanglement of shared and private information is accomplished using a
unified information-theoretic approach, which also serves to establish a
stronger link between the latent representations and the observed data. The
resulting model, accompanied by an efficient optimization algorithm, allows
simultaneous adaptation from a single source to multiple target domains. We
test our approach on three challenging publicly-available datasets, showing
that it outperforms several popular domain adaptation methods.Comment: 19 pages, 5 Figures, 5 Table
Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach
In unsupervised domain adaptation, it is widely known that the target domain
error can be provably reduced by having a shared input representation that
makes the source and target domains indistinguishable from each other. Very
recently it has been studied that not just matching the marginal input
distributions, but the alignment of output (class) distributions is also
critical. The latter can be achieved by minimizing the maximum discrepancy of
predictors (classifiers). In this paper, we adopt this principle, but propose a
more systematic and effective way to achieve hypothesis consistency via
Gaussian processes (GP). The GP allows us to define/induce a hypothesis space
of the classifiers from the posterior distribution of the latent random
functions, turning the learning into a simple large-margin posterior separation
problem, far easier to solve than previous approaches based on adversarial
minimax optimization. We formulate a learning objective that effectively pushes
the posterior to minimize the maximum discrepancy. This is further shown to be
equivalent to maximizing margins and minimizing uncertainty of the class
predictions in the target domain, a well-established principle in classical
(semi-)supervised learning. Empirical results demonstrate that our approach is
comparable or superior to the existing methods on several benchmark domain
adaptation datasets
Unsupervised Representation Adversarial Learning Network: from Reconstruction to Generation
A good representation for arbitrarily complicated data should have the
capability of semantic generation, clustering and reconstruction. Previous
research has already achieved impressive performance on either one. This paper
aims at learning a disentangled representation effective for all of them in an
unsupervised way. To achieve all the three tasks together, we learn the forward
and inverse mapping between data and representation on the basis of a symmetric
adversarial process. In theory, we minimize the upper bound of the two
conditional entropy loss between the latent variables and the observations
together to achieve the cycle consistency. The newly proposed RepGAN is tested
on MNIST, fashionMNIST, CelebA, and SVHN datasets to perform unsupervised
classification, generation and reconstruction tasks. The result demonstrates
that RepGAN is able to learn a useful and competitive representation. To the
author's knowledge, our work is the first one to achieve both a high
unsupervised classification accuracy and low reconstruction error on MNIST.
Codes are available at https://github.com/yzhouas/RepGAN-tensorflow
Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations
We would like to learn a representation of the data which decomposes an
observation into factors of variation which we can independently control.
Specifically, we want to use minimal supervision to learn a latent
representation that reflects the semantics behind a specific grouping of the
data, where within a group the samples share a common factor of variation. For
example, consider a collection of face images grouped by identity. We wish to
anchor the semantics of the grouping into a relevant and disentangled
representation that we can easily exploit. However, existing deep probabilistic
models often assume that the observations are independent and identically
distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new
deep probabilistic model for learning a disentangled representation of a set of
grouped observations. The ML-VAE separates the latent representation into
semantically meaningful parts by working both at the group level and the
observation level, while retaining efficient test-time inference. Quantitative
and qualitative evaluations show that the ML-VAE model (i) learns a
semantically meaningful disentanglement of grouped data, (ii) enables
manipulation of the latent representation, and (iii) generalises to unseen
groups
MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation
Unpaired multimodal image-to-image translation is a task of translating a
given image in a source domain into diverse images in the target domain,
overcoming the limitation of one-to-one mapping. Existing multimodal
translation models are mainly based on the disentangled representations with an
image reconstruction loss. We propose two approaches to improve multimodal
translation quality. First, we use a content representation from the source
domain conditioned on a style representation from the target domain. Second,
rather than using a typical image reconstruction loss, we design MILO (Mutual
Information LOss), a new stochastically-defined loss function based on
information theory. This loss function directly reflects the interpretation of
latent variables as a random variable. We show that our proposed model Mutual
Information with StOchastic Style Representation(MISO) achieves
state-of-the-art performance through extensive experiments on various
real-world datasets
Recent Advances in Autoencoder-Based Representation Learning
Learning useful representations with little or no supervision is a key
challenge in artificial intelligence. We provide an in-depth review of recent
advances in representation learning with a focus on autoencoder-based models.
To organize these results we make use of meta-priors believed useful for
downstream tasks, such as disentanglement and hierarchical organization of
features. In particular, we uncover three main mechanisms to enforce such
properties, namely (i) regularizing the (approximate or aggregate) posterior
distribution, (ii) factorizing the encoding and decoding distribution, or (iii)
introducing a structured prior distribution. While there are some promising
results, implicit or explicit supervision remains a key enabler and all current
methods use strong inductive biases and modeling assumptions. Finally, we
provide an analysis of autoencoder-based representation learning through the
lens of rate-distortion theory and identify a clear tradeoff between the amount
of prior knowledge available about the downstream tasks, and how useful the
representation is for this task.Comment: Presented at the third workshop on Bayesian Deep Learning (NeurIPS
2018
- …