19,132 research outputs found
Unsupervised Latent Space Translation Network
One task that is often discussed in a computer vision is the mapping of an
image from one domain to a corresponding image in another domain known as
image-to-image translation. Currently there are several approaches solving this
task. In this paper, we present an enhancement of the UNIT framework that aids
in removing its main drawbacks. More specifically, we introduce an additional
adversarial discriminator on the latent representation used instead of VAE,
which enforces the latent space distributions of both domains to be similar. On
MNIST and USPS domain adaptation tasks, this approach greatly outperforms
competing approaches.Comment: To be published in conference proceedings of ESANN 202
Unsupervised Deformable Registration for Multi-Modal Images via Disentangled Representations
We propose a fully unsupervised multi-modal deformable image registration
method (UMDIR), which does not require any ground truth deformation fields or
any aligned multi-modal image pairs during training. Multi-modal registration
is a key problem in many medical image analysis applications. It is very
challenging due to complicated and unknown relationships between different
modalities. In this paper, we propose an unsupervised learning approach to
reduce the multi-modal registration problem to a mono-modal one through image
disentangling. In particular, we decompose images of both modalities into a
common latent shape space and separate latent appearance spaces via an
unsupervised multi-modal image-to-image translation approach. The proposed
registration approach is then built on the factorized latent shape code, with
the assumption that the intrinsic shape deformation existing in original image
domain is preserved in this latent space. Specifically, two metrics have been
proposed for training the proposed network: a latent similarity metric defined
in the common shape space and a learningbased image similarity metric based on
an adversarial loss. We examined different variations of our proposed approach
and compared them with conventional state-of-the-art multi-modal registration
methods. Results show that our proposed methods achieve competitive performance
against other methods at substantially reduced computation time.Comment: Accepted as an oral presentation in IPMI 201
TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning a mapping between
two visual domains. However, learning a translation across large geometry
variations always ends up with failure. In this work, we present a novel
disentangle-and-translate framework to tackle the complex objects
image-to-image translation task. Instead of learning the mapping on the image
space directly, we disentangle image space into a Cartesian product of the
appearance and the geometry latent spaces. Specifically, we first introduce a
geometry prior loss and a conditional VAE loss to encourage the network to
learn independent but complementary representations. The translation is then
built on appearance and geometry space separately. Extensive experiments
demonstrate the superior performance of our method to other state-of-the-art
approaches, especially in the challenging near-rigid and non-rigid objects
translation tasks. In addition, by taking different exemplars as the appearance
references, our method also supports multimodal translation. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htm
TraVeLGAN: Image-to-image Translation by Transformation Vector Learning
Interest in image-to-image translation has grown substantially in recent
years with the success of unsupervised models based on the cycle-consistency
assumption. The achievements of these models have been limited to a particular
subset of domains where this assumption yields good results, namely homogeneous
domains that are characterized by style or texture differences. We tackle the
challenging problem of image-to-image translation where the domains are defined
by high-level shapes and contexts, as well as including significant clutter and
heterogeneity. For this purpose, we introduce a novel GAN based on preserving
intra-domain vector transformations in a latent space learned by a siamese
network. The traditional GAN system introduced a discriminator network to guide
the generator into generating images in the target domain. To this two-network
system we add a third: a siamese network that guides the generator so that each
original image shares semantics with its generated version. With this new
three-network system, we no longer need to constrain the generators with the
ubiquitous cycle-consistency restraint. As a result, the generators can learn
mappings between more complex domains that differ from each other by large
differences - not just style or texture
Unsupervised Image-to-Image Translation Networks
Unsupervised image-to-image translation aims at learning a joint distribution
of images in different domains by using images from the marginal distributions
in individual domains. Since there exists an infinite set of joint
distributions that can arrive the given marginal distributions, one could infer
nothing about the joint distribution from the marginal distributions without
additional assumptions. To address the problem, we make a shared-latent space
assumption and propose an unsupervised image-to-image translation framework
based on Coupled GANs. We compare the proposed framework with competing
approaches and present high quality image translation results on various
challenging unsupervised image translation tasks, including street scene image
translation, animal image translation, and face image translation. We also
apply the proposed framework to domain adaptation and achieve state-of-the-art
performance on benchmark datasets. Code and additional results are available in
https://github.com/mingyuliutw/unit .Comment: NIPS 2017, 11 pages, 6 figure
Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders
Unsupervised Image-to-Image Translation achieves spectacularly advanced
developments nowadays. However, recent approaches mainly focus on one model
with two domains, which may face heavy burdens with large cost of
training time and model parameters, under such a requirement that domains
are freely transferred to each other in a general setting. To address this
problem, we propose a novel and unified framework named Domain-Bank, which
consists of a global shared auto-encoder and domain-specific
encoders/decoders, assuming that a universal shared-latent sapce can be
projected. Thus, we yield complexity in model parameters along with a
huge reduction of the time budgets. Besides the high efficiency, we show the
comparable (or even better) image translation results over state-of-the-arts on
various challenging unsupervised image translation tasks, including face image
translation, fashion-clothes translation and painting style translation. We
also apply the proposed framework to domain adaptation and achieve
state-of-the-art performance on digit benchmark datasets. Further, thanks to
the explicit representation of the domain-specific decoders as well as the
universal shared-latent space, it also enables us to conduct incremental
learning to add a new domain encoder/decoder. Linear combination of different
domains' representations is also obtained by fusing the corresponding decoders
LOGAN: Unpaired Shape Transform in Latent Overcomplete Space
We introduce LOGAN, a deep neural network aimed at learning general-purpose
shape transforms from unpaired domains. The network is trained on two sets of
shapes, e.g., tables and chairs, while there is neither a pairing between
shapes from the domains as supervision nor any point-wise correspondence
between any shapes. Once trained, LOGAN takes a shape from one domain and
transforms it into the other. Our network consists of an autoencoder to encode
shapes from the two input domains into a common latent space, where the latent
codes concatenate multi-scale shape features, resulting in an overcomplete
representation. The translator is based on a generative adversarial network
(GAN), operating in the latent space, where an adversarial loss enforces
cross-domain translation while a feature preservation loss ensures that the
right shape features are preserved for a natural shape transform. We conduct
ablation studies to validate each of our key network designs and demonstrate
superior capabilities in unpaired shape transforms on a variety of examples
over baselines and state-of-the-art approaches. We show that LOGAN is able to
learn what shape features to preserve during shape translation, either local or
non-local, whether content or style, depending solely on the input domains for
training.Comment: Download supplementary material here ->
https://kangxue.org/papers/logan_supp.pd
A Style Transfer Approach to Source Separation
Training neural networks for source separation involves presenting a mixture
recording at the input of the network and updating network parameters in order
to produce an output that resembles the clean source. Consequently, supervised
source separation depends on the availability of paired mixture-clean training
examples. In this paper, we interpret source separation as a style transfer
problem. We present a variational auto-encoder network that exploits the
commonality across the domain of mixtures and the domain of clean sounds and
learns a shared latent representation across the two domains. Using these
cycle-consistent variational auto-encoders, we learn a mapping from the mixture
domain to the domain of clean sounds and perform source separation without
explicitly supervising with paired training examples
Bilingual-GAN: A Step Towards Parallel Text Generation
Latent space based GAN methods and attention based sequence to sequence
models have achieved impressive results in text generation and unsupervised
machine translation respectively. Leveraging the two domains, we propose an
adversarial latent space based model capable of generating parallel sentences
in two languages concurrently and translating bidirectionally. The bilingual
generation goal is achieved by sampling from the latent space that is shared
between both languages. First two denoising autoencoders are trained, with
shared encoders and back-translation to enforce a shared latent state between
the two languages. The decoder is shared for the two translation directions.
Next, a GAN is trained to generate synthetic "code" mimicking the languages'
shared latent space. This code is then fed into the decoder to generate text in
either language. We perform our experiments on Europarl and Multi30k datasets,
on the English-French language pair, and document our performance using both
supervised and unsupervised machine translation
Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound
Unsupervised image-to-image translation is a class of computer vision
problems which aims at modeling conditional distribution of images in the
target domain, given a set of unpaired images in the source and target domains.
An image in the source domain might have multiple representations in the target
domain. Therefore, ambiguity in modeling of the conditional distribution
arises, specially when the images in the source and target domains come from
different modalities. Current approaches mostly rely on simplifying assumptions
to map both domains into a shared-latent space. Consequently, they are only
able to model the domain-invariant information between the two modalities.
These approaches usually fail to model domain-specific information which has no
representation in the target domain. In this work, we propose an unsupervised
image-to-image translation framework which maximizes a domain-specific
variational information bound and learns the target domain-invariant
representation of the two domain. The proposed framework makes it possible to
map a single source image into multiple images in the target domain, utilizing
several target domain-specific codes sampled randomly from the prior
distribution, or extracted from reference images.Comment: NIPS 201
- …