9,848 research outputs found
Dynamic fusion with intra-and inter-modality attention flow for visual question answering
Learning effective fusion of multi-modality features is at the heart of
visual question answering. We propose a novel method of dynamically fusing
multi-modal features with intra- and inter-modality information flow, which
alternatively pass dynamic information between and across the visual and
language modalities. It can robustly capture the high-level interactions
between language and vision domains, thus significantly improves the
performance of visual question answering. We also show that the proposed
dynamic intra-modality attention flow conditioned on the other modality can
dynamically modulate the intra-modality attention of the target modality, which
is vital for multimodality feature fusion. Experimental evaluations on the VQA
2.0 dataset show that the proposed method achieves state-of-the-art VQA
performance. Extensive ablation studies are carried out for the comprehensive
analysis of the proposed method.Comment: CVPR 2019 ORA
Visual Cortex Inspired CNN Model for Feature Construction in Text Analysis
Recently, biologically inspired models are gradually proposed to solve the problem in text analysis. Convolutional neural networks (CNN) are hierarchical artificial neural networks, which include a various of multilayer perceptrons. According to biological research, CNN can be improved by bringing in the attention modulation and memory processing of primate visual cortex. In this paper, we employ the above properties of primate visual cortex to improve CNN and propose a biological-mechanism-driven-feature-construction based answer recommendation method (BMFC-ARM), which is used to recommend the best answer for the corresponding given questions in community question answering. BMFC-ARM is an improved CNN with four channels respectively representing questions, answers, asker information and answerer information, and mainly contains two stages: biological mechanism driven feature construction (BMFC) and answer ranking. BMFC imitates the attention modulation property by introducing the asker information and answerer information of given questions and the similarity between them, and imitates the memory processing property through bringing in the user reputation information for answerers. Then the feature vector for answer ranking is constructed by fusing the asker-answerer similarities, answerer's reputation and the corresponding vectors of question, answer, asker and answerer. Finally, the Softmax is used at the stage of answer ranking to get best answers by the feature vector. The experimental results of answer recommendation on the Stackexchange dataset show that BMFC-ARM exhibits better performance
- …