Search CORE

2 research outputs found

Interactive Image Manipulation with Natural Language Instruction Commands

Author: Nakamura Satoshi
Sakti Sakriani
Shinagawa Seitaro
Suzuki Yu
Yoshino Koichiro
Publication venue
Publication date: 23/02/2018
Field of study

We propose an interactive image-manipulation system with natural language instruction, which can generate a target image from a source image and an instruction that describes the difference between the source and the target image. The system makes it possible to modify a generated image interactively and make natural language conditioned image generation more controllable. We construct a neural network that handles image vectors in latent space to transform the source vector to the target vector by using the vector of instruction. The experimental results indicate that the proposed framework successfully generates the target image by using a source image and an instruction on manipulation in our dataset.Comment: accepted at NIPS 2017 ViGIL workshop (https://nips2017vigil.github.io/

arXiv.org e-Print Archive

Learning to Globally Edit Images with Textual Description

Author: Kang SingBing
Wang Hai
Williams Jason D.
Publication venue
Publication date: 12/10/2018
Field of study

We show how we can globally edit images using textual instructions: given a source image and a textual instruction for the edit, generate a new image transformed under this instruction. To tackle this novel problem, we develop three different trainable models based on RNN and Generative Adversarial Network (GAN). The models (bucket, filter bank, and end-to-end) differ in how much expert knowledge is encoded, with the most general version being purely end-to-end. To train these systems, we use Amazon Mechanical Turk to collect textual descriptions for around 2000 image pairs sampled from several datasets. Experimental results evaluated on our dataset validate our approaches. In addition, given that the filter bank model is a good compromise between generality and performance, we investigate it further by replacing RNN with Graph RNN, and show that Graph RNN improves performance. To the best of our knowledge, this is the first computational photography work on global image editing that is purely based on free-form textual instructions

arXiv.org e-Print Archive