58,651 research outputs found

    Convolutional Read-Write Operations on External Memory Networks for Movie Question Answering

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ๊น€๊ฑดํฌ.We propose a novel memory network model named Read-Write Memory Network (RWMN) to perform question and answering tasks for large-scale, multimodal movie story understanding. The key focus of our RWMN model is to propose the read network and the write network that consist of multiple convolutional layers, which enable memory read and write operations to have high capacity and flexibility. While existing memory augmented network models treat each memory slot as an independent block, our use of multi-layered CNNs enables the model to read and write sequential memory cells as chunks, which is more reasonable to represent a sequential story because adjacent memory blocks often have strong correlations. For evaluation, we apply our model on the MovieQA benchmark, and achieve the best accuracies on several tasks, especially significantly on the visual QA task. Our model shows a potential to better understand not only the facts in the story, but also more abstract information, such as relationships between characters and the reasons for their actions. Code is available on our project page: http://github.com/seilna/RWMN๋”ฅ๋Ÿฌ๋‹์ด ์ปดํ“จํ„ฐ ๋น„์ „ ๋ฐ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์™€ ๊ด€๋ จ๋œ ์—ฌ๋Ÿฌ ๋ฌธ์ œ๋“ค์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž„์— ๋”ฐ๋ผ, ๊ทธ์™€ ๋”๋ถˆ์–ด ์‹œ๊ฐ ๋ฐ ์–ธ์–ด ์ •๋ณด๋ฅผ ํ†ตํ•ฉ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์—ฐ๊ตฌ๋“ค๋„ ๋น ๋ฅด๊ฒŒ ๋ฐœ์ „ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด์— ์ด์•ผ๊ธฐ ์š”์†Œ๊ฐ€ ํฌํ•จ๋œ ๋น„๋””์˜ค์˜ ๋‚ด์šฉ์„ ์ดํ•ดํ•˜๊ณ , ์ด์™€ ๊ด€๋ จ๋œ ์ž์—ฐ์–ด ์งˆ๋ฌธ์„ ์ดํ•ดํ•˜์—ฌ ์•Œ๋งž์€ ์ •๋‹ต์„ ๋„์ถœํ•˜๋Š” Movie Question Answering (MovieQA) ๋ฌธ์ œ๊ฐ€ ์ œ์‹œ๋˜์—ˆ์œผ๋ฉฐ, ํ˜„์žฌ๊นŒ์ง€๋„ ์ด๋ฅผ ํ’€๊ธฐ ์œ„ํ•œ ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” Movie Question Answering ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด์„œ, External Memory ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ƒˆ๋กœ์šด ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜์˜€์œผ๋ฉฐ, ์ด๋ฅผ Read-Write Memory Network (RWMN) ๋ผ๊ณ  ์ง€์นญํ•œ๋‹ค. ๊ธฐ์กด์˜ External Memory ๋ชจ๋ธ๋“ค์ด ๊ฐ ๋ฉ”๋ชจ๋ฆฌ ๋ธ”๋ก๋“ค์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ทจ๊ธ‰ํ•œ ๋ฐ˜๋ฉด์—, RWMN์€ ๋ฉ”๋ชจ๋ฆฌ ๋ธ”๋ก๋“ค ์‚ฌ์ด์— ๋‚ด์žฌํ•˜๊ณ  ์žˆ๋Š” ์‹œ๊ฐ„์ ์ธ ์ƒ๊ด€๊ด€๊ณ„ (temporal correlation) ๋ฅผ ํ™œ์šฉํ•˜๋Š” Convolutional Read/Write operation์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค. Movie Question Answering ๋ฌธ์ œ์—์„œ ๊ฐ ๋ฉ”๋ชจ๋ฆฌ ๋ธ”๋ก๋“ค์ด ์ฃผ์–ด์ง„ ์˜ํ™”์˜ ๋‚ด์šฉ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์ ์„ ๊ณ ๋ คํ–ˆ์„ ๋•Œ, ์‹œ๊ฐ„์ ์ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์€ MovieQA ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š”๋ฐ ๋งค์šฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋ฉฐ, ์ด๋ฅผ ํ™œ์šฉํ•œ RWMN์€ MovieQA ๊ณต์‹ ๋ฒค์น˜๋งˆํฌ์˜ 6๊ฐœ subtask์ค‘ 4๊ฐœ task์—์„œ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” MovieQA์— ํฌํ•จ๋œ ์งˆ๋ฌธ๋“ค ์ค‘์—์„œ ์ฃผ๋กœ ๊ณ ์ฐจ์›์  ์ดํ•ด๊ฐ€ ํ•„์š”ํ•œ ์งˆ๋ฌธ๋“ค์— ๋Œ€ํ•ด ์ œ์‹œํ•œ RWMN์ด ๊ธฐ์กด ๋ชจ๋ธ๋“ค๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๊ฒƒ์„ ๋ณด์ž„์œผ๋กœ์จ, ์‹œ๊ฐ„์ ์ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ž˜ ๋ชจ๋ธ๋งํ•˜์—ฌ Question Answering ๋ฌธ์ œ๋ฅผ ํ’€๊ณ  ์žˆ์Œ์„ ์‹คํ—˜์ ์œผ๋กœ ๋ณด์˜€๋‹ค. RWMN์˜ ๊ตฌํ˜„ ์ฝ”๋“œ์— ๊ด€ํ•œ ์ •๋ณด๋Š” ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. (http://github.com/seilna/RWMN)Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Neural Memory Networks . . . . . . . . . . . . . . . . . . 5 2.2 Models for MovieQA . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3 Read-Write Memory Network (RWMN) 8 3.1 Movie Embedding . . . . . . . . . . . . . . . . . . . . . . 10 3.2 The Write Network . . . . . . . . . . . . . . . . . . . . . . 10 3.3 The Read Network . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 Question embedding . . . . . . . . . . . . . . . . . 12 3.3.2 Query-dependant memory embedding . . . . . . . 12 3.3.3 Convolutional memory read . . . . . . . . . . . . . 12 3.4 Answer Selection . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Experiments 15 4.1 MovieQA Tasks and Experimental Setting . . . . . . . . . 15 4.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Quantitative Results . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 Results of VQA task . . . . . . . . . . . . . . . . . 19 4.3.2 Results of text-only tasks . . . . . . . . . . . . . . 19 4.4 Ablation Results . . . . . . . . . . . . . . . . . . . . . . . 20 4.5 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 5 Conclusion 29 ์š”์•ฝ 34 Acknowledgements 36Maste

    Memory Networks

    Full text link
    We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs
    • โ€ฆ
    corecore