Recently, diffusion-based deep generative models (e.g., Stable Diffusion)
have shown impressive results in text-to-image synthesis. However, current
text-to-image models often require multiple passes of prompt engineering by
humans in order to produce satisfactory results for real-world applications. We
propose BeautifulPrompt, a deep generative model to produce high-quality
prompts from very simple raw descriptions, which enables diffusion-based models
to generate more beautiful images. In our work, we first fine-tuned the
BeautifulPrompt model over low-quality and high-quality collecting prompt
pairs. Then, to ensure that our generated prompts can generate more beautiful
images, we further propose a Reinforcement Learning with Visual AI Feedback
technique to fine-tune our model to maximize the reward values of the generated
prompts, where the reward values are calculated based on the PickScore and the
Aesthetic Scores. Our results demonstrate that learning from visual AI feedback
promises the potential to improve the quality of generated prompts and images
significantly. We further showcase the integration of BeautifulPrompt to a
cloud-native AI platform to provide better text-to-image generation service in
the cloud.Comment: emnlp 202