46 research outputs found
Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis
Most conditional generation tasks expect diverse outputs given a single
conditional context. However, conditional generative adversarial networks
(cGANs) often focus on the prior conditional information and ignore the input
noise vectors, which contribute to the output variations. Recent attempts to
resolve the mode collapse issue for cGANs are usually task-specific and
computationally expensive. In this work, we propose a simple yet effective
regularization term to address the mode collapse issue for cGANs. The proposed
method explicitly maximizes the ratio of the distance between generated images
with respect to the corresponding latent codes, thus encouraging the generators
to explore more minor modes during training. This mode seeking regularization
term is readily applicable to various conditional generation tasks without
imposing training overhead or modifying the original network structures. We
validate the proposed algorithm on three conditional image synthesis tasks
including categorical generation, image-to-image translation, and text-to-image
synthesis with different baseline models. Both qualitative and quantitative
results demonstrate the effectiveness of the proposed regularization method for
improving diversity without loss of quality.Comment: CVPR 2019. Code: https://github.com/HelenMao/MSGA
KG-GAN: Knowledge-Guided Generative Adversarial Networks
Can generative adversarial networks (GANs) generate roses of various colors
given only roses of red petals as input? The answer is negative, since GANs'
discriminator would reject all roses of unseen petal colors. In this study, we
propose knowledge-guided GAN (KG-GAN) to fuse domain knowledge with the GAN
framework. KG-GAN trains two generators; one learns from data whereas the other
learns from knowledge with a constraint function. Experimental results
demonstrate the effectiveness of KG-GAN in generating unseen flower categories
from seen categories given textual descriptions of the unseen ones
Dual Variational Generation for Low-Shot Heterogeneous Face Recognition
Heterogeneous Face Recognition (HFR) is a challenging issue because of the
large domain discrepancy and a lack of heterogeneous data. This paper considers
HFR as a dual generation problem, and proposes a novel Dual Variational
Generation (DVG) framework. It generates large-scale new paired heterogeneous
images with the same identity from noise, for the sake of reducing the domain
gap of HFR. Specifically, we first introduce a dual variational autoencoder to
represent a joint distribution of paired heterogeneous images. Then, in order
to ensure the identity consistency of the generated paired heterogeneous
images, we impose a distribution alignment in the latent space and a pairwise
identity preserving in the image space. Moreover, the HFR network reduces the
domain discrepancy by constraining the pairwise feature distances between the
generated paired heterogeneous images. Extensive experiments on four HFR
databases show that our method can significantly improve state-of-the-art
results. The related code is available at https://github.com/BradyFU/DVG.Comment: Accepted by NeurIPS 201
PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain
We propose a universal image reconstruction method to represent detailed
images purely from binary sparse edge and flat color domain. Inspired by the
procedures of painting, our framework, based on generative adversarial network,
consists of three phases: Imitation Phase aims at initializing networks,
followed by Generating Phase to reconstruct preliminary images. Moreover,
Refinement Phase is utilized to fine-tune preliminary images into final outputs
with details. This framework allows our model generating abundant high
frequency details from sparse input information. We also explore the defects of
disentangling style latent space implicitly from images, and demonstrate that
explicit color domain in our model performs better on controllability and
interpretability. In our experiments, we achieve outstanding results on
reconstructing realistic images and translating hand drawn drafts into
satisfactory paintings. Besides, within the domain of edge-to-image
translation, our model PI-REC outperforms existing state-of-the-art methods on
evaluations of realism and accuracy, both quantitatively and qualitatively.Comment: 15 pages, 13 figure
DRIT++: Diverse Image-to-Image Translation via Disentangled Representations
Image-to-image translation aims to learn the mapping between two visual
domains. There are two main challenges for this task: 1) lack of aligned
training pairs and 2) multiple possible outputs from a single input image. In
this work, we present an approach based on disentangled representation for
generating diverse outputs without paired training images. To synthesize
diverse outputs, we propose to embed images onto two spaces: a domain-invariant
content space capturing shared information across domains and a domain-specific
attribute space. Our model takes the encoded content features extracted from a
given input and attribute vectors sampled from the attribute space to
synthesize diverse outputs at test time. To handle unpaired training data, we
introduce a cross-cycle consistency loss based on disentangled representations.
Qualitative results show that our model can generate diverse and realistic
images on a wide range of tasks without paired training data. For quantitative
evaluations, we measure realism with user study and Fr\'{e}chet inception
distance, and measure diversity with the perceptual distance metric,
Jensen-Shannon divergence, and number of statistically-different bins.Comment: IJCV Journal extension for ECCV 2018 "Diverse Image-to-Image
Translation via Disentangled Representations" arXiv:1808.00948. Project Page:
http://vllab.ucmerced.edu/hylee/DRIT_pp/ Code:
https://github.com/HsinYingLee/DRI
Unified cross-modality feature disentangler for unsupervised multi-domain MRI abdomen organs segmentation
Our contribution is a unified cross-modality feature disentagling approach
for multi-domain image translation and multiple organ segmentation. Using CT as
the labeled source domain, our approach learns to segment multi-modal
(T1-weighted and T2-weighted) MRI having no labeled data. Our approach uses a
variational auto-encoder (VAE) to disentangle the image content from style. The
VAE constrains the style feature encoding to match a universal prior (Gaussian)
that is assumed to span the styles of all the source and target modalities. The
extracted image style is converted into a latent style scaling code, which
modulates the generator to produce multi-modality images according to the
target domain code from the image content features. Finally, we introduce a
joint distribution matching discriminator that combines the translated images
with task-relevant segmentation probability maps to further constrain and
regularize image-to-image (I2I) translations. We performed extensive
comparisons to multiple state-of-the-art I2I translation and segmentation
methods. Our approach resulted in the lowest average multi-domain image
reconstruction error of 1.340.04. Our approach produced an average Dice
similarity coefficient (DSC) of 0.85 for T1w and 0.90 for T2w MRI for
multi-organ segmentation, which was highly comparable to a fully supervised MRI
multi-organ segmentation network (DSC of 0.86 for T1w and 0.90 for T2w MRI).Comment: This paper has been accepted by MICCAI202
Annealing Genetic GAN for Minority Oversampling
The key to overcome class imbalance problems is to capture the distribution
of minority class accurately. Generative Adversarial Networks (GANs) have shown
some potentials to tackle class imbalance problems due to their capability of
reproducing data distributions given ample training data samples. However, the
scarce samples of one or more classes still pose a great challenge for GANs to
learn accurate distributions for the minority classes. In this work, we propose
an Annealing Genetic GAN (AGGAN) method, which aims to reproduce the
distributions closest to the ones of the minority classes using only limited
data samples. Our AGGAN renovates the training of GANs as an evolutionary
process that incorporates the mechanism of simulated annealing. In particular,
the generator uses different training strategies to generate multiple offspring
and retain the best. Then, we use the Metropolis criterion in the simulated
annealing to decide whether we should update the best offspring for the
generator. As the Metropolis criterion allows a certain chance to accept the
worse solutions, it enables our AGGAN steering away from the local optimum.
According to both theoretical analysis and experimental studies on multiple
imbalanced image datasets, we prove that the proposed training strategy can
enable our AGGAN to reproduce the distributions of minority classes from scarce
samples and provide an effective and robust solution for the class imbalance
problem
Unsupervised Eyeglasses Removal in the Wild
Eyeglasses removal is challenging in removing different kinds of eyeglasses,
e.g., rimless glasses, full-rim glasses and sunglasses, and recovering
appropriate eyes. Due to the large visual variants, the conventional methods
lack scalability. Most existing works focus on the frontal face images in the
controlled environment, such as the laboratory, and need to design specific
systems for different eyeglass types. To address the limitation, we propose a
unified eyeglass removal model called Eyeglasses Removal Generative Adversarial
Network (ERGAN), which could handle different types of glasses in the wild. The
proposed method does not depend on the dense annotation of eyeglasses location
but benefits from the large-scale face images with weak annotations.
Specifically, we study the two relevant tasks simultaneously, i.e., removing
and wearing eyeglasses. Given two facial images with and without eyeglasses,
the proposed model learns to swap the eye area in two faces. The generation
mechanism focuses on the eye area and invades the difficulty of generating a
new face. In the experiment, we show the proposed method achieves a competitive
removal quality in terms of realism and diversity. Furthermore, we evaluate
ERGAN on several subsequent tasks, such as face verification and facial
expression recognition. The experiment shows that our method could serve as a
pre-processing method for these tasks
Multi-Domain Image-to-Image Translation with Adaptive Inference Graph
In this work, we address the problem of multi-domain image-to-image
translation with particular attention paid to computational cost. In
particular, current state of the art models require a large and deep model in
order to handle the visual diversity of multiple domains. In a context of
limited computational resources, increasing the network size may not be
possible. Therefore, we propose to increase the network capacity by using an
adaptive graph structure. At inference time, the network estimates its own
graph by selecting specific sub-networks. Sub-network selection is implemented
using Gumbel-Softmax in order to allow end-to-end training. This approach leads
to an adjustable increase in number of parameters while preserving an almost
constant computational cost. Our evaluation on two publicly available datasets
of facial and painting images shows that our adaptive strategy generates better
images with fewer artifacts than literature methodsComment: Accepted at ICPR 202
Impressions2Font: Generating Fonts by Specifying Impressions
Various fonts give us various impressions, which are often represented by
words. This paper proposes Impressions2Font (Imp2Font) that generates font
images with specific impressions. Imp2Font is an extended version of
conditional generative adversarial networks (GANs). More precisely, Imp2Font
accepts an arbitrary number of impression words as the condition to generate
the font images. These impression words are converted into a soft-constraint
vector by an impression embedding module built on a word embedding technique.
Qualitative and quantitative evaluations prove that Imp2Font generates font
images with higher quality than comparative methods by providing multiple
impression words or even unlearned words.Comment: submitted ICDAR 202