Search CORE

1,588 research outputs found

Dynamic fusion with intra-and inter-modality attention flow for visual question answering

Author: GAO Peng
HOI Steven C. H.
JIANG Zhengkai
LI Hongsheng
LU Pan
WANG Xiaogang
YOU Haoxuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2019
Field of study

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.Comment: CVPR 2019 ORA

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University