2 research outputs found
Multi-mapping Image-to-Image Translation via Learning Disentanglement
Recent advances of image-to-image translation focus on learning the
one-to-many mapping from two aspects: multi-modal translation and multi-domain
translation. However, the existing methods only consider one of the two
perspectives, which makes them unable to solve each other's problem. To address
this issue, we propose a novel unified model, which bridges these two
objectives. First, we disentangle the input images into the latent
representations by an encoder-decoder architecture with a conditional
adversarial training in the feature space. Then, we encourage the generator to
learn multi-mappings by a random cross-domain translation. As a result, we can
manipulate different parts of the latent representations to perform multi-modal
and multi-domain translations simultaneously. Experiments demonstrate that our
method outperforms state-of-the-art methods.Comment: Accepted by NeurIPS 2019. Code will be available at
https://github.com/Xiaoming-Yu/DMI
A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis
Text-to-image synthesis refers to computational methods which translate human
written textual descriptions, in the form of keywords or sentences, into images
with similar semantic meaning to the text. In earlier research, image synthesis
relied mainly on word to image correlation analysis combined with supervised
methods to find best alignment of the visual content matching to the text.
Recent progress in deep learning (DL) has brought a new set of unsupervised
deep learning methods, particularly deep generative models which are able to
generate realistic visual images using suitably trained neural network models.
In this paper, we review the most recent development in the text-to-image
synthesis research domain. Our survey first introduces image synthesis and its
challenges, and then reviews key concepts such as generative adversarial
networks (GANs) and deep convolutional encoder-decoder neural networks (DCNN).
After that, we propose a taxonomy to summarize GAN based text-to-image
synthesis into four major categories: Semantic Enhancement GANs, Resolution
Enhancement GANs, Diversity Enhancement GANS, and Motion Enhancement GANs. We
elaborate the main objective of each group, and further review typical GAN
architectures in each group. The taxonomy and the review outline the techniques
and the evolution of different approaches, and eventually provide a clear
roadmap to summarize the list of contemporaneous solutions that utilize GANs
and DCNNs to generate enthralling results in categories such as human faces,
birds, flowers, room interiors, object reconstruction from edge maps (games)
etc. The survey will conclude with a comparison of the proposed solutions,
challenges that remain unresolved, and future developments in the text-to-image
synthesis domain.Comment: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
201