2 research outputs found
Interactive Image Manipulation with Natural Language Instruction Commands
We propose an interactive image-manipulation system with natural language
instruction, which can generate a target image from a source image and an
instruction that describes the difference between the source and the target
image. The system makes it possible to modify a generated image interactively
and make natural language conditioned image generation more controllable. We
construct a neural network that handles image vectors in latent space to
transform the source vector to the target vector by using the vector of
instruction. The experimental results indicate that the proposed framework
successfully generates the target image by using a source image and an
instruction on manipulation in our dataset.Comment: accepted at NIPS 2017 ViGIL workshop
(https://nips2017vigil.github.io/
Learning to Globally Edit Images with Textual Description
We show how we can globally edit images using textual instructions: given a
source image and a textual instruction for the edit, generate a new image
transformed under this instruction. To tackle this novel problem, we develop
three different trainable models based on RNN and Generative Adversarial
Network (GAN). The models (bucket, filter bank, and end-to-end) differ in how
much expert knowledge is encoded, with the most general version being purely
end-to-end. To train these systems, we use Amazon Mechanical Turk to collect
textual descriptions for around 2000 image pairs sampled from several datasets.
Experimental results evaluated on our dataset validate our approaches. In
addition, given that the filter bank model is a good compromise between
generality and performance, we investigate it further by replacing RNN with
Graph RNN, and show that Graph RNN improves performance. To the best of our
knowledge, this is the first computational photography work on global image
editing that is purely based on free-form textual instructions