37 research outputs found
μκ° κΈ°λ° μΆλ‘ μ μν λ€μ€ μνμ κΉμ νμ΅
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μΈλ¬Έλν νλκ³Όμ μΈμ§κ³Όνμ 곡, 2018. 8. μ₯λ³ν.μ»΄ν¨ν° μκ°κ³Ό μμ°μ΄ μ²λ¦¬ κΈ°μ μ λ°λ¬μ μΌλ° μΈκ³΅ μ§λ₯μ λν μ°κ΅¬λ₯Ό κ°μν νμλ€. μκ°κ³Ό μμ°μ΄λ μΈκ°μ΄ μ¬μ©νλ κ°μ₯ μνΈ μμ©μ μΈ μνμ΄λ―λ‘ μκ°κ³Ό μΈμ΄μ λͺ¨λ κΈ°λ°ν μ΄ν΄μ μΆλ‘ μ μΌλ° μΈκ³΅ μ§λ₯μ ν΅μ¬ κ³Όμ κ° λλ€. μκ° μ§μ μλ΅(VQA)μ μκ° νλ§ ν
μ€νΈμ ν μλ‘μ, μ΄μμ΄ λλ νλ§ ν
μ€νΈ [Turing, 1950] μ°κ΅¬μ κΈ°λ°νλ€. VQA λ°μ΄ν°μ
[Agrawal et al., 2017]μ λμ©λμ μ΄λ―Έμ§ λ°μ΄ν°μ
μ μ΄μ©ν΄ μ§λ νμ΅μ μν μ§λ¬Έ-λ΅ μμ μμ§νμλ€. μλ₯Ό λ€λ©΄ "λκ° μκ²½μ μ°κ³ μλ?", "μ°μ°μ΄ λ€μ§μ΄μ Έ μλ?", "μΉ¨λμλ λͺ λͺ
μ μμ΄λ€μ΄ μλ κ°?"μ κ°μ μ§λ¬Έμ κΈ°κ³λ μμ§ν λ΅λ€μ μ΄μ©ν΄ νμ΅ν ν μ΄λ―Έμ§μ μ§λ¬Έλ§μ λ³΄κ³ λ΅μ λ΄μ΄μΌ νλ€.
λ³Έ μ°κ΅¬μμλ μκ° μ§μ μλ΅ κ³Όμ λ₯Ό λ€μ€ μν νμ΅ λ¬Έμ λ‘ μΌλ°ννκ³ , λ€μ€ μν νμ΅μ λ°μ μ λ€μΈ΅ ꡬ쑰 μ κ²½λ§μ λ€μν ννλ₯Ό νμ©νμ¬ κ³μΈ΅μ νμμ νμ΅νλκΉμνμ΅,λ€μ€μνκΉμνμ΅ κ΄μ μμμ΄ν΄λ³Έλ€.λ€μ€μνκΉμν μ΅μ μΈ κ°μ§ λΆλ₯ κΈ°μ€, λ€μ€ μν μ΅ν©, κ΅μ°¨ μν, 곡μ νμ νμ΅μΌλ‘ λλμ΄ μκ°νλ€. λ, μ΄μ μ°κ΅¬λ€ Kim et al. [2016b, 2017a, 2018]λ₯Ό λ°νμΌλ‘ μΈ κ°μ§ μ£Όμ μ°κ΅¬, λ€μ€ μν μμ°¨ νμ΅, λ€μ€ μν μ κ³μ μμΌμ°¨ μΆμΆ, μμΌμ°¨ μ£Όμ λ§ μ λ΄μ©λ€μ λ
Όμνλ€.
λ€μ€ μν μμ°¨ νμ΅μ μμ°¨ νμ΅μ κΈ°λ°μΌλ‘ μκ°-μΈμ΄ λ€μ€ μνμ κ²°ν© νμ μ μ°Ύλλ€. μ¬κΈ°μμ μ κ²½λ§μ μΌλΆλ μ λΆλΆμ μ κ²½λ§μ΄ νννλ λͺ©μ ν¨μμ μμ°¨ μ€λ₯λ₯Ό νμ΅νλλ‘ κ°μ νλ€. λ°λ©΄, λ€μ€ μν μ κ³μ μμΌμ°¨ μΆμΆμ κ° μν κ° μ μ νκ² μ ν μ¬μλ 쑰건μμ μμκ³±μ΄ κ²°ν© ν¨μλ‘μ κ°μ§λ μνμ μλ―Έλ₯Ό μ€λͺ
ν μ μκ² νλ€. μμΌμ°¨ μ£Όμ λ§μ μ΄μ λ μ°κ΅¬λ₯Ό ν΅ν©νλ€. μ κ³μ μμΌμ°¨ μΆμΆμ λν ν΄μμ λ°νμΌλ‘ νλ ¬ μ°κ²° κ³±μ μ΄μ©ν΄ λ¨μΌ μ£Όμ κΈ°μ λ₯Ό μμΌμ°¨ μ£Όμλ‘ μ±κ³΅μ μΌλ‘ μΌλ°ννμ¬ κ³μ° λΉμ©μ λ¨μΌ μ£Όμ λ§κ³Ό λΉμ·ν μμ€μΌλ‘ ν¨μ¨μ μ΄λ€. λ λμκ°, μ£Όμ μμ°¨ νμ΅μ μ μνμ¬ μ¬λ κ°μ μμΌμ°¨ μ£Όμ μ§λλ₯Ό μΆλ‘ κ³Όμ μμ νμ©ν μ μκ² νμ¬ λ€μΈ΅ μ£Όμ λ§μμ λ°μνλ κ³Όμ‘°μ μ λ°©μ§νλ€.
κ·Έ κ²°κ³Ό, λ€μ€ μν μμ°¨ λ§ (MRN)μ VQA μ±λ¦°μ§ 2016μμ 4μλ₯Ό κΈ°λ‘νμ κ³ , 2016λ
11μ μΆν μμ μλ λ³΄λ€ μ μ νλΌλ―Έν°λ₯Ό μ΄μ©νμ¬ λ€μ€ μν μ κ³μ μμΌμ°¨ μ£Όμ λ§ (MLB)μ μ μνκ³ μΈκ³ μ΅κ³ μ±λ₯μ κ°±μ νμλ€. μμΌμ°¨ μ£Όμ λ§
(BAN)μ VQA μ±λ¦°μ§ 2018μμ μ€μ°μΉ(곡λ 2μ)λ₯Ό νμμΌλ λ¨μΌ λͺ¨λΈλ‘λ μ΅κ³ μ±λ₯μ 보μλ€. μ΄ κ²°κ³Όλ 2018λ
6μ 18μΌ, CVPR 2018 νν(λ―Έκ΅ μνΈλ μ΄ν¬ μν°) μν¬μ΅μ μ΄μ²λμ΄ κ΅¬λ λ°ννμλ€.
μκ° λλ μμ°μ΄ μ²λ¦¬λ κ³μ λ°μ μ€μΈ λΆμΌμ΄λ―λ‘ μ μνλ λ€μ€ μν κΉμ νμ΅ λ°©λ²λ€μ μ»΄ν¨ν° μκ°κ³Ό μμ°μ΄ μ²λ¦¬ κΈ°μ μ λ°λ¬κ³Ό λλΆμ΄ λ ν₯μλ μ μλ κ°λ₯μ±μ΄ μλ€.Abstract i
Chapter 1 Introduction 1
Chapter 2 Multimodal Deep Learning 6
2.1 Introduction 6
2.2 Linear Model 8
2.3 Multimodal Deep Learning 10
2.3.1 Multimodal Fusion 10
2.3.2 Cross Modality Learning 11
2.3.3 Shared Representation Learning 13
2.4 Cognitive Models 13
2.5 Conclusions 15
Chapter 3 Multimodal Residual Learning 16
3.1 Introduction 16
3.2 Related Works 18
3.2.1 Deep Residual Learning 18
3.2.2 Stacked Attention Networks 19
3.3 Multimodal Residual Networks 20
3.3.1 Background 20
3.3.2 Multimodal Residual Networks 21
3.4 Experiments 22
3.4.1 Visual QA Dataset 22
3.4.2 Implementation 24
3.4.3 Exploring Alternative Models 26
3.5 Results 27
3.5.1 Quantitative Analysis 27
3.5.2 Qualitative Analysis 29
3.6 Conclusions 30
Chapter 4 Multimodal Low-rank Bilinear Pooling 37
4.1 Introduction 37
4.2 Low-rank Bilinear Model 39
4.3 Low-rank Bilinear Pooling 40
4.3.1 Full Model 41
4.3.2 Nonlinear Activation 41
4.3.3 Shortcut Connection 42
4.4 Multimodal Low-rank Bilinear Attention Networks 43
4.4.1 Low-rank Bilinear Pooling in Attention Mechanism 43
4.4.2 Multimodal Low-rank Bilinear Attention Networks 43
4.4.3 Model Schema 44
4.5 Experiments 45
4.5.1 Preprocessing 48
4.5.2 Vision Embedding 48
4.5.3 Hyperparameters 49
4.6 Results 49
4.6.1 Six Experiment Results 50
4.6.2 Comparison with State-of-the-Art 52
4.6.3 Ensemble of Seven Models 52
4.7 Related Works 52
4.7.1 Multimodal Residual Networks 53
4.7.2 Higher-Order Boltzmann Machines 53
4.7.3 Multiplicative Integration with Recurrent Neural Networks 54
4.7.4 Compact Bilinear Pooling 55
4.8 Discussions 56
4.8.1 Understanding of Multimodal Compact Bilinear Pooling 56
4.8.2 Replacement of Low-rank Bilinear Pooling 58
4.9 Conclusions 59
Chapter 5 Bilinear Attention Networks 62
5.1 Introduction 62
5.2 Low-rank Bilinear Pooling 64
5.3 Bilinear Attention Networks 66
5.4 Related Works 68
5.5 Experiments 69
5.5.1 Datasets 69
5.5.2 Preprocessing 71
5.5.3 Nonlinearity 72
5.6 Variants of BAN 72
5.6.1 Enhancing Glove Word Embedding 72
5.6.2 Integrating Counting Module 73
5.6.3 Integrating Multimodal Factorized Bilinear (MFB) Pooling 75
5.6.4 Classifier 75
5.6.5 Hyperparameters and Regularization 76
5.7 VQA Results and Discussions 77
5.7.1 Quantitative Results 77
5.7.2 Residual Learning of Attention 78
5.7.3 Qualitative Analysis 80
5.8 Flickr30k Entities Results and Discussions 80
5.9 Conclusions 82
Chapter 6 Conclusions 89
Bibliography 91
μ΄λ‘ 106Docto