17,672 research outputs found
CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
We propose a new recurrent generative model for generating images from text
captions while attending on specific parts of text captions. Our model creates
images by incrementally adding patches on a "canvas" while attending on words
from text caption at each timestep. Finally, the canvas is passed through an
upscaling network to generate images. We also introduce a new method for
generating visual-semantic sentence embeddings based on self-attention over
text. We compare our model's generated images with those generated Reed et.
al.'s model and show that our model is a stronger baseline for text to image
generation tasks.Comment: CVC 201
Semantic Image Synthesis via Adversarial Learning
In this paper, we propose a way of synthesizing realistic images directly
with natural language description, which has many useful applications, e.g.
intelligent image manipulation. We attempt to accomplish such synthesis: given
a source image and a target text description, our model synthesizes images to
meet two requirements: 1) being realistic while matching the target text
description; 2) maintaining other image features that are irrelevant to the
text description. The model should be able to disentangle the semantic
information from the two modalities (image and text), and generate new images
from the combined semantics. To achieve this, we proposed an end-to-end neural
architecture that leverages adversarial learning to automatically learn
implicit loss functions, which are optimized to fulfill the aforementioned two
requirements. We have evaluated our model by conducting experiments on
Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated
that our model is capable of synthesizing realistic images that match the given
descriptions, while still maintain other features of original images.Comment: Accepted to ICCV 201
Generative Adversarial Text to Image Synthesis
Automatic synthesis of realistic images from text would be interesting and
useful, but current AI systems are still far from this goal. However, in recent
years generic and powerful recurrent neural network architectures have been
developed to learn discriminative text feature representations. Meanwhile, deep
convolutional generative adversarial networks (GANs) have begun to generate
highly compelling images of specific categories, such as faces, album covers,
and room interiors. In this work, we develop a novel deep architecture and GAN
formulation to effectively bridge these advances in text and image model- ing,
translating visual concepts from characters to pixels. We demonstrate the
capability of our model to generate plausible images of birds and flowers from
detailed text descriptions.Comment: ICML 201
Adversarial Learning of Semantic Relevance in Text to Image Synthesis
We describe a new approach that improves the training of generative
adversarial nets (GANs) for synthesizing diverse images from a text input. Our
approach is based on the conditional version of GANs and expands on previous
work leveraging an auxiliary task in the discriminator. Our generated images
are not limited to certain classes and do not suffer from mode collapse while
semantically matching the text input. A key to our training methods is how to
form positive and negative training examples with respect to the class label of
a given image. Instead of selecting random training examples, we perform
negative sampling based on the semantic distance from a positive example in the
class. We evaluate our approach using the Oxford-102 flower dataset, adopting
the inception score and multi-scale structural similarity index (MS-SSIM)
metrics to assess discriminability and diversity of the generated images. The
empirical results indicate greater diversity in the generated images,
especially when we gradually select more negative training examples closer to a
positive example in the semantic space
- …