481 research outputs found
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
In this paper, we propose an Attentional Generative Adversarial Network
(AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained
text-to-image generation. With a novel attentional generative network, the
AttnGAN can synthesize fine-grained details at different subregions of the
image by paying attentions to the relevant words in the natural language
description. In addition, a deep attentional multimodal similarity model is
proposed to compute a fine-grained image-text matching loss for training the
generator. The proposed AttnGAN significantly outperforms the previous state of
the art, boosting the best reported inception score by 14.14% on the CUB
dataset and 170.25% on the more challenging COCO dataset. A detailed analysis
is also performed by visualizing the attention layers of the AttnGAN. It for
the first time shows that the layered attentional GAN is able to automatically
select the condition at the word level for generating different parts of the
image
A Simple and Effective Baseline for Attentional Generative Adversarial Networks
Synthesising a text-to-image model of high-quality images by guiding the
generative model through the Text description is an innovative and challenging
task. In recent years, AttnGAN based on the Attention mechanism to guide GAN
training has been proposed, SD-GAN, which adopts a self-distillation technique
to improve the performance of the generator and the quality of image
generation, and Stack-GAN++, which gradually improves the details and quality
of the image by stacking multiple generators and discriminators. However, this
series of improvements to GAN all have redundancy to a certain extent, which
affects the generation performance and complexity to a certain extent. We use
the popular simple and effective idea (1) to remove redundancy structure and
improve the backbone network of AttnGAN. (2) to integrate and reconstruct
multiple losses of DAMSM. Our improvements have significantly improved the
model size and training efficiency while ensuring that the model's performance
is unchanged and finally proposed our \textbf{SEAttnGAN}. Code is avalilable at
https://github.com/jmyissb/SEAttnGAN.Comment: 12 pages, 3 figure
- …