Image Manipulation via Multi-Hop Instructions -- A New Dataset and
  Weakly-Supervised Neuro-Symbolic Approach

Garg, Dinesh; Garg, Poorva; Goswami, Ashish; Gupta, Mohit; Khandelwal, Dinesh; Modi, Satyam; Mondal, Arnab Kumar; Shah, Kevin; Singh, Harman; Singla, Parag

Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach

Authors: Dinesh Garg
Poorva Garg
Ashish Goswami
Mohit Gupta
Dinesh Khandelwal
Satyam Modi
Arnab Kumar Mondal
Kevin Shah
Harman Singh
Parag Singla
Publication date: 24 October 2023
Publisher

Abstract

We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides its execution. We create a new dataset for the task, and extensive experiments demonstrate that NeuroSIM is highly competitive with or beats SOTA baselines that make use of supervised data for manipulation.Comment: EMNLP 2023 (long paper, main conference

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.14410

Last time updated on 26/05/2023