In this paper, we propose a novel task -- Manipulation Question Answering
(MQA), where the robot is required to find the answer to the question by
actively exploring the environment via manipulation. A framework consisting of
a QA model and a manipulation model is proposed to solve this problem. For the
QA model, we adopt the method of Visual Question Answering (VQA). For the
manipulation model, a Deep Q Network (DQN) model is proposed to generate
manipulations. By manipulating objects, the robot can continuously explore the
bin until the answer to the question is found. Besides, a novel dataset for
simulation that contains a variety of object models, complicated scenarios and
corresponding question-answer pairs is established. Extensive experiments have
been conducted to validate the effectiveness of the proposed framework