22,096 research outputs found
Semantically Invariant Text-to-Image Generation
Image captioning has demonstrated models that are capable of generating
plausible text given input images or videos. Further, recent work in image
generation has shown significant improvements in image quality when text is
used as a prior. Our work ties these concepts together by creating an
architecture that can enable bidirectional generation of images and text. We
call this network Multi-Modal Vector Representation (MMVR). Along with MMVR, we
propose two improvements to the text conditioned image generation. Firstly, a
n-gram metric based cost function is introduced that generalizes the caption
with respect to the image. Secondly, multiple semantically similar sentences
are shown to help in generating better images. Qualitative and quantitative
evaluations demonstrate that MMVR improves upon existing text conditioned image
generation results by over 20%, while integrating visual and text modalities.Comment: 5 papers, 5 figures, Published in 2018 25th IEEE International
Conference on Image Processing (ICIP
Semi-supervised FusedGAN for Conditional Image Generation
We present FusedGAN, a deep network for conditional image synthesis with
controllable sampling of diverse images. Fidelity, diversity and controllable
sampling are the main quality measures of a good image generation model. Most
existing models are insufficient in all three aspects. The FusedGAN can perform
controllable sampling of diverse images with very high fidelity. We argue that
controllability can be achieved by disentangling the generation process into
various stages. In contrast to stacked GANs, where multiple stages of GANs are
trained separately with full supervision of labeled intermediate images, the
FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike
existing methods, which requires full supervision with paired conditions and
images, the FusedGAN can effectively leverage more abundant images without
corresponding conditions in training, to produce more diverse samples with high
fidelity. We achieve this by fusing two generators: one for unconditional image
generation, and the other for conditional image generation, where the two
partly share a common latent space thereby disentangling the generation. We
demonstrate the efficacy of the FusedGAN in fine grained image generation tasks
such as text-to-image, and attribute-to-face generation
- …