1,026 research outputs found
Self-Attentive Spatial Adaptive Normalization for Cross-Modality Domain Adaptation
Despite the successes of deep neural networks on many challenging vision
tasks, they often fail to generalize to new test domains that are not
distributed identically to the training data. The domain adaptation becomes
more challenging for cross-modality medical data with a notable domain shift.
Given that specific annotated imaging modalities may not be accessible nor
complete. Our proposed solution is based on the cross-modality synthesis of
medical images to reduce the costly annotation burden by radiologists and
bridge the domain gap in radiological images. We present a novel approach for
image-to-image translation in medical images, capable of supervised or
unsupervised (unpaired image data) setups. Built upon adversarial training, we
propose a learnable self-attentive spatial normalization of the deep
convolutional generator network's intermediate activations. Unlike previous
attention-based image-to-image translation approaches, which are either
domain-specific or require distortion of the source domain's structures, we
unearth the importance of the auxiliary semantic information to handle the
geometric changes and preserve anatomical structures during image translation.
We achieve superior results for cross-modality segmentation between unpaired
MRI and CT data for multi-modality whole heart and multi-modal brain tumor MRI
(T1/T2) datasets compared to the state-of-the-art methods. We also observe
encouraging results in cross-modality conversion for paired MRI and CT images
on a brain dataset. Furthermore, a detailed analysis of the cross-modality
image translation, thorough ablation studies confirm our proposed method's
efficacy.Comment: Accepted for publication in IEEE Transactions on Medical Imaging
(IEEE TMI
DLOW: Domain Flow for Adaptation and Generalization
In this work, we present a domain flow generation(DLOW) model to bridge two
different domains by generating a continuous sequence of intermediate domains
flowing from one domain to the other. The benefits of our DLOW model are
two-fold. First, it is able to transfer source images into different styles in
the intermediate domains. The transferred images smoothly bridge the gap
between source and target domains, thus easing the domain adaptation task.
Second, when multiple target domains are provided for training, our DLOW model
is also able to generate new styles of images that are unseen in the training
data. We implement our DLOW model based on CycleGAN. A domainness variable is
introduced to guide the model to generate the desired intermediate domain
images. In the inference phase, a flow of various styles of images can be
obtained by varying the domainness variable. We demonstrate the effectiveness
of our model for both cross-domain semantic segmentation and the style
generalization tasks on benchmark datasets. Our implementation is available at
https://github.com/ETHRuiGong/DLOW.Comment: Accepted to CVPR 2019 (oral
XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
Style transfer usually refers to the task of applying color and texture
information from a specific style image to a given content image while
preserving the structure of the latter. Here we tackle the more generic problem
of semantic style transfer: given two unpaired collections of images, we aim to
learn a mapping between the corpus-level style of each collection, while
preserving semantic content shared across the two domains. We introduce XGAN
("Cross-GAN"), a dual adversarial autoencoder, which captures a shared
representation of the common domain semantic content in an unsupervised way,
while jointly learning the domain-to-domain image translations in both
directions. We exploit ideas from the domain adaptation literature and define a
semantic consistency loss which encourages the model to preserve semantics in
the learned embedding space. We report promising qualitative results for the
task of face-to-cartoon translation. The cartoon dataset, CartoonSet, we
collected for this purpose is publicly available at
google.github.io/cartoonset/ as a new benchmark for semantic style transfer.Comment: Domain Adaptation for Visual Understanding at ICML'1
Mix and match networks: cross-modal alignment for zero-pair image-to-image translation
This paper addresses the problem of inferring unseen cross-modal
image-to-image translations between multiple modalities. We assume that only
some of the pairwise translations have been seen (i.e. trained) and infer the
remaining unseen translations (where training pairs are not available). We
propose mix and match networks, an approach where multiple encoders and
decoders are aligned in such a way that the desired translation can be obtained
by simply cascading the source encoder and the target decoder, even when they
have not interacted during the training stage (i.e. unseen). The main challenge
lies in the alignment of the latent representations at the bottlenecks of
encoder-decoder pairs. We propose an architecture with several tools to
encourage alignment, including autoencoders and robust side information and
latent consistency losses. We show the benefits of our approach in terms of
effectiveness and scalability compared with other pairwise image-to-image
translation approaches. We also propose zero-pair cross-modal image
translation, a challenging setting where the objective is inferring semantic
segmentation from depth (and vice-versa) without explicit segmentation-depth
pairs, and only from two (disjoint) segmentation-RGB and depth-RGB training
sets. We observe that a certain part of the shared information between unseen
modalities might not be reachable, so we further propose a variant that
leverages pseudo-pairs which allows us to exploit this shared information
between the unseen modalities.Comment: Accepted by IJC
Semantics-Aware Image to Image Translation and Domain Transfer
Image to image translation is the problem of transferring an image from a
source domain to a target domain. We present a new method to transfer the
underlying semantics of an image even when there are geometric changes across
the two domains. Specifically, we present a Generative Adversarial Network
(GAN) that can transfer semantic information presented as segmentation masks.
Our main technical contribution is an encoder-decoder based generator
architecture that jointly encodes the image and its underlying semantics and
translates both simultaneously to the target domain. Additionally, we propose
object transfiguration and cross-domain semantic consistency losses that
preserve the underlying semantic labels maps. We demonstrate the effectiveness
of our approach in multiple object transfiguration and domain transfer tasks
through qualitative and quantitative experiments. The results show that our
method is better at transferring image semantics than state of the art image to
image translation methods
AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
Supervised deep learning methods have shown promising results for the task of
monocular depth estimation; but acquiring ground truth is costly, and prone to
noise as well as inaccuracies. While synthetic datasets have been used to
circumvent above problems, the resultant models do not generalize well to
natural scenes due to the inherent domain shift. Recent adversarial approaches
for domain adaption have performed well in mitigating the differences between
the source and target domains. But these methods are mostly limited to a
classification setup and do not scale well for fully-convolutional
architectures. In this work, we propose AdaDepth - an unsupervised domain
adaptation strategy for the pixel-wise regression task of monocular depth
estimation. The proposed approach is devoid of above limitations through a)
adversarial learning and b) explicit imposition of content consistency on the
adapted target representation. Our unsupervised approach performs competitively
with other established approaches on depth estimation tasks and achieves
state-of-the-art results in a semi-supervised setting.Comment: CVPR 201
Similarity-preserving Image-image Domain Adaptation for Person Re-identification
This article studies the domain adaptation problem in person
re-identification (re-ID) under a "learning via translation" framework,
consisting of two components, 1) translating the labeled images from the source
to the target domain in an unsupervised manner, 2) learning a re-ID model using
the translated images. The objective is to preserve the underlying human
identity information after image translation, so that translated images with
labels are effective for feature learning on the target domain. To this end, we
propose a similarity preserving generative adversarial network (SPGAN) and its
end-to-end trainable version, eSPGAN. Both aiming at similarity preserving,
SPGAN enforces this property by heuristic constraints, while eSPGAN does so by
optimally facilitating the re-ID model learning. More specifically, SPGAN
separately undertakes the two components in the "learning via translation"
framework. It first preserves two types of unsupervised similarity, namely,
self-similarity of an image before and after translation, and
domain-dissimilarity of a translated source image and a target image. It then
learns a re-ID model using existing networks. In comparison, eSPGAN seamlessly
integrates image translation and re-ID model learning. During the end-to-end
training of eSPGAN, re-ID learning guides image translation to preserve the
underlying identity information of an image. Meanwhile, image translation
improves re-ID learning by providing identity-preserving training samples of
the target domain style. In the experiment, we show that identities of the fake
images generated by SPGAN and eSPGAN are well preserved. Based on this, we
report the new state-of-the-art domain adaptation results on two large-scale
person re-ID datasets.Comment: 14 pages, 7 tables, 14 figures, this version is not fully edited and
will be updated soon. arXiv admin note: text overlap with arXiv:1711.0702
Sensor Transfer: Learning Optimal Sensor Effect Image Augmentation for Sim-to-Real Domain Adaptation
Performance on benchmark datasets has drastically improved with advances in
deep learning. Still, cross-dataset generalization performance remains
relatively low due to the domain shift that can occur between two different
datasets. This domain shift is especially exaggerated between synthetic and
real datasets. Significant research has been done to reduce this gap,
specifically via modeling variation in the spatial layout of a scene, such as
occlusions, and scene environmental factors, such as time of day and weather
effects. However, few works have addressed modeling the variation in the sensor
domain as a means of reducing the synthetic to real domain gap. The camera or
sensor used to capture a dataset introduces artifacts into the image data that
are unique to the sensor model, suggesting that sensor effects may also
contribute to domain shift. To address this, we propose a learned augmentation
network composed of physically-based augmentation functions. Our proposed
augmentation pipeline transfers specific effects of the sensor model --
chromatic aberration, blur, exposure, noise, and color temperature -- from a
real dataset to a synthetic dataset. We provide experiments that demonstrate
that augmenting synthetic training datasets with the proposed learned
augmentation framework reduces the domain gap between synthetic and real
domains for object detection in urban driving scenes
Domain Adaptive Person Re-Identification via Camera Style Generation and Label Propagation
Unsupervised domain adaptation in person re-identification resorts to labeled
source data to promote the model training on target domain, facing the dilemmas
caused by large domain shift and large camera variations. The non-overlapping
labels challenge that source domain and target domain have entirely different
persons further increases the re-identification difficulty. In this paper, we
propose a novel algorithm to narrow such domain gaps. We derive a camera style
adaptation framework to learn the style-based mappings between different camera
views, from the target domain to the source domain, and then we can transfer
the identity-based distribution from the source domain to the target domain on
the camera level. To overcome the non-overlapping labels challenge and guide
the person re-identification model to narrow the gap further, an efficient and
effective soft-labeling method is proposed to mine the intrinsic local
structure of the target domain through building the connection between
GAN-translated source domain and the target domain. Experiment results
conducted on real benchmark datasets indicate that our method gets
state-of-the-art results
One-Shot Unsupervised Cross Domain Translation
Given a single image x from domain A and a set of images from domain B, our
task is to generate the analogous of x in B. We argue that this task could be a
key AI capability that underlines the ability of cognitive agents to act in the
world and present empirical evidence that the existing unsupervised domain
translation methods fail on this task. Our method follows a two step process.
First, a variational autoencoder for domain B is trained. Then, given the new
sample x, we create a variational autoencoder for domain A by adapting the
layers that are close to the image in order to directly fit x, and only
indirectly adapt the other layers. Our experiments indicate that the new method
does as well, when trained on one sample x, as the existing domain transfer
methods, when these enjoy a multitude of training samples from domain A. Our
code is made publicly available at
https://github.com/sagiebenaim/OneShotTranslationComment: Published at NIPS 201
- …