Search CORE

58,651 research outputs found

Convolutional Read-Write Operations on External Memory Networks for Movie Question Answering

Author: 나세일
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김건희.We propose a novel memory network model named Read-Write Memory Network (RWMN) to perform question and answering tasks for large-scale, multimodal movie story understanding. The key focus of our RWMN model is to propose the read network and the write network that consist of multiple convolutional layers, which enable memory read and write operations to have high capacity and flexibility. While existing memory augmented network models treat each memory slot as an independent block, our use of multi-layered CNNs enables the model to read and write sequential memory cells as chunks, which is more reasonable to represent a sequential story because adjacent memory blocks often have strong correlations. For evaluation, we apply our model on the MovieQA benchmark, and achieve the best accuracies on several tasks, especially significantly on the visual QA task. Our model shows a potential to better understand not only the facts in the story, but also more abstract information, such as relationships between characters and the reasons for their actions. Code is available on our project page: http://github.com/seilna/RWMN딥러닝이 컴퓨터 비전 및 자연어처리와 관련된 여러 문제들에서 뛰어난 성능을 보임에 따라, 그와 더불어 시각 및 언어 정보를 통합적으로 활용하는 연구들도 빠르게 발전하게 되었다. 이에 이야기 요소가 포함된 비디오의 내용을 이해하고, 이와 관련된 자연어 질문을 이해하여 알맞은 정답을 도출하는 Movie Question Answering (MovieQA) 문제가 제시되었으며, 현재까지도 이를 풀기 위한 많은 연구가 진행되고 있다. 본 연구에서는 Movie Question Answering 문제를 풀기 위해서, External Memory 구조를 기반으로 한 새로운 모델 구조를 제안하였으며, 이를 Read-Write Memory Network (RWMN) 라고 지칭한다. 기존의 External Memory 모델들이 각 메모리 블록들을 독립적으로 취급한 반면에, RWMN은 메모리 블록들 사이에 내재하고 있는 시간적인 상관관계 (temporal correlation) 를 활용하는 Convolutional Read/Write operation을 이용하는 것이 핵심이다. Movie Question Answering 문제에서 각 메모리 블록들이 주어진 영화의 내용을 순차적으로 인코딩하고 있다는 점을 고려했을 때, 시간적인 상관관계를 모델링하는 것은 MovieQA 문제를 푸는데 매우 중요한 역할을 하며, 이를 활용한 RWMN은 MovieQA 공식 벤치마크의 6개 subtask중 4개 task에서 가장 높은 성능을 보였다. 또한, 본 연구에서는 MovieQA에 포함된 질문들 중에서 주로 고차원적 이해가 필요한 질문들에 대해 제시한 RWMN이 기존 모델들보다 높은 성능을 내는 것을 보임으로써, 시간적인 상관관계를 잘 모델링하여 Question Answering 문제를 풀고 있음을 실험적으로 보였다. RWMN의 구현 코드에 관한 정보는 프로젝트 페이지에서 확인할 수 있다. (http://github.com/seilna/RWMN)Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Neural Memory Networks . . . . . . . . . . . . . . . . . . 5 2.2 Models for MovieQA . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3 Read-Write Memory Network (RWMN) 8 3.1 Movie Embedding . . . . . . . . . . . . . . . . . . . . . . 10 3.2 The Write Network . . . . . . . . . . . . . . . . . . . . . . 10 3.3 The Read Network . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 Question embedding . . . . . . . . . . . . . . . . . 12 3.3.2 Query-dependant memory embedding . . . . . . . 12 3.3.3 Convolutional memory read . . . . . . . . . . . . . 12 3.4 Answer Selection . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Experiments 15 4.1 MovieQA Tasks and Experimental Setting . . . . . . . . . 15 4.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Quantitative Results . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 Results of VQA task . . . . . . . . . . . . . . . . . 19 4.3.2 Results of text-only tasks . . . . . . . . . . . . . . 19 4.4 Ablation Results . . . . . . . . . . . . . . . . . . . . . . . 20 4.5 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 5 Conclusion 29 요약 34 Acknowledgements 36Maste

SNU Open Repository and Archive

Memory Networks

Author: Bordes Antoine
Chopra Sumit
Weston Jason
Publication venue
Publication date: 29/11/2015
Field of study

We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs

arXiv.org e-Print Archive

CiteSeerX