1 research outputs found
νΈλμ€ν¬λ¨Έλ₯Ό ν΅ν 볡μ‘ν μΆλ‘ λ₯λ ₯ μ 볡μ μν μ°κ΅¬: μκ°μ , λνμ , μνμ μΆλ‘ μμ μ μ©
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ°μ
곡νκ³Ό, 2021. 2. μ‘°μ±μ€.As deep learning models advanced, research is focusing on sophisticated tasks that require complex reasoning, rather than simple classification tasks. These complex tasks require multiple reasoning steps that resembles human intelligence. Architecture-wise, recurrent neural networks and convolutional neural networks have long been the main stream model for deep learning. However, both models suffer from shortcomings from their innate architecture. Nowadays, the attention-based Transformer is replacing them due to its superior architecture and performance. Particularly, the encoder of the Transformer has been extensively studied in the field of natural language processing. However, for the Transformer to be effective in data with distinct structures and characteristics, appropriate adjustments to its structure is required. In this dissertation, we propose novel architectures based on the Transformer encoder for various supervised learning tasks with different data types and characteristics. The tasks that we consider are visual IQ tests, dialogue state tracking and mathematical question answering. For the visual IQ test, the input is in a visual format with hierarchy. To deal with this, we propose using a hierarchical Transformer encoder with structured representation that employs a novel neural network architecture to improve both perception and reasoning. The hierarchical structure of the Transformer encoders and the architecture of each individual Transformer encoder all fit to the characteristics of the data of visual IQ tests. For dialogue state tracking, value prediction for multiple domain-slot pairs is required. To address this issue, we propose a dialogue state tracking model using a pre-trained language model, which is a pre-trained Transformer encoder, for domain-slot relationship modeling. We introduced special tokens for each domain-slot pair which enables effective dependency modeling among domain-slot pairs through the pre-trained language encoder. Finally, for mathematical question answering, we propose a method to pre-train a Transformer encoder on a mathematical question answering dataset for improved performance. Our pre-training method, Question-Answer Masked Language Modeling, utilizes both the question and answer text, which is suitable for the mathematical question answering dataset. Through experiments, we show that each of our proposed methods is effective in their corresponding task and data type.μν μ κ²½λ§κ³Ό ν©μ±κ³± μ κ²½λ§μ μ€λ«λμ λ₯λ¬λ λΆμΌμμ μ£Όμ λͺ¨λΈλ‘ μ°μ¬μλ€. νμ§λ§ λ λͺ¨λΈ λͺ¨λ μ체μ μΈ κ΅¬μ‘°μμ μ€λ νκ³λ₯Ό κ°μ§λ€. μ΅κ·Όμλ μ΄ν
μ
(attention)μ κΈ°λ°ν νΈλμ€ν¬λ¨Έ(Transformer)κ° λ λμ μ±λ₯κ³Ό κ΅¬μ‘°λ‘ μΈν΄μ μ΄λ€μ λμ²΄ν΄ λκ°κ³ μλ€. νΈλμ€ν¬λ¨Έ μΈμ½λ(Transformer encoder)λ μμ°μ΄ μ²λ¦¬ λΆμΌμμ νΉλ³ν λ λ§μ μ°κ΅¬κ° μ΄λ£¨μ΄μ§κ³ μλ€. νμ§λ§ Transformerκ° νΉλ³ν ꡬ쑰μ νΉμ§μ κ°μ§ λ°μ΄ν°μ λν΄μλ μ λλ‘ μλνκΈ° μν΄μλ κ·Έ ꡬ쑰μ μ μ ν λ³νκ° μꡬλλ€. λ³Έ λ
Όλ¬Έμμλ λ€μν λ°μ΄ν° μ’
λ₯μ νΉμ±μ λν κ΅μ¬ νμ΅μ μ μ©ν μ μλ νΈλμ€ν¬λ¨Έ μΈμ½λμ κΈ°λ°ν μλ‘μ΄ κ΅¬μ‘°μ λͺ¨λΈλ€μ μ μνλ€. μ΄λ² μ°κ΅¬μμ λ€λ£¨λ κ³Όμ
μ μκ° IQ ν
μ€νΈ, λν μν νΈλνΉ κ·Έλ¦¬κ³ μν μ§μ μλ΅μ΄λ€. μκ° IQ ν
μ€νΈμ μ
λ ₯ λ³μλ μκ³λ₯Ό κ°μ§ μκ°μ μΈ ννμ΄λ€. μ΄μ λμνκΈ° μν΄μ μ°λ¦¬λ μΈμ§μ μ¬κ³ μΈ‘λ©΄μμ μ±λ₯μ ν₯μ μν¬ μ μλ μλ‘μ΄ λ΄λ΄ λ€νΈμν¬ κ΅¬μ‘°μΈ, ꡬ쑰νλ νννμ μ²λ¦¬ν μ μλ κ³μΈ΅μ μΈ νΈλμ€ν¬λ¨Έ μΈμ½λ λͺ¨λΈμ μ μνλ€. νΈλμ€ ν¬λ¨Έ μΈμ½λμ κ³μΈ΅μ ꡬ쑰μ κ°κ°μ νΈλμ€ν¬λ¨Έ μΈμ½λμ ꡬ쑰 λͺ¨λκ° μκ° IQ ν
μ€νΈ λ°μ΄ν°μ νΉμ§μ μ ν©νλ€. λν μν νΈλνΉμ μ¬λ¬ κ°μ λλ©μΈ-μ¬λ‘―(domain-slot)μμ λν κ°(value)μ΄ μꡬλλ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄μ μ°λ¦¬λ μ¬μ νμ΅λ νΈλμ€ν¬λ¨Έ μΈμ½λμΈ, μ¬μ νμ΅ μΈμ΄ λͺ¨λΈμ νμ©νμ¬ λλ©μΈ-μ¬λ‘―μ κ΄κ³λ₯Ό λͺ¨λΈλ§νλ κ²μ μ μνλ€. κ° λλ©μΈ-μ¬λ‘― μμ λν νΉμ ν ν°μ λμ
ν¨μΌλ‘μ¨ ν¨κ³Όμ μΌλ‘ λλ©μΈ-μ¬λ‘― μλ€ κ°μ κ΄κ³λ₯Ό λͺ¨λΈλ§ ν μ μλ€. λ§μ§λ§μΌλ‘, μν μ§μ μλ΅μ μν΄μλ μν μ§μ μλ΅ λ°μ΄ν°μ λν΄μ μ¬μ νμ΅μ μ§νν¨μΌλ‘μ¨ μν μ§μ μλ΅ κ³Όμ
μ λν΄μ μ±λ₯μ λνλ λ°©λ²μ μ μνλ€. μ°λ¦¬μ μ¬μ νμ΅ λ°©λ²μΈ μ§μ-μλ΅ λ§μ€νΉ μΈμ΄ λͺ¨λΈλ§μ μ§μμ μλ΅ ν
μ€νΈ λͺ¨λλ₯Ό νμ© ν¨μΌλ‘μ¨ μν μ§μ μλ΅ λ°μ΄ν°μ μ ν©ν ννμ΄λ€. μ€νμ ν΅ν΄μ κ°κ°μ μ μλ λ°©λ²λ‘ λ€μ΄ ν΄λΉνλ κ³Όμ
κ³Ό λ°μ΄ν° μ’
λ₯μ λν΄μ ν¨κ³Όμ μΈ κ²μ λ°νλ€.Abstract i
Contents vi
List of Tables viii
List of Figures xii
Chapter 1 Introduction 1
Chapter 2 Literature Review 7
2.1 Related Works on Transformer . . . . . . . . . . . . . . . . . . . . . 7
2.2 Related Works on Visual IQ Tests . . . . . . . . . . . . . . . . . . . 10
2.2.1 RPM-related studies . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Object Detection related studies . . . . . . . . . . . . . . . . 11
2.3 Related works on Dialogue State Tracking . . . . . . . . . . . . . . . 12
2.4 Related Works on Mathematical Question Answering . . . . . . . . . 14
2.4.1 Pre-training of Neural Networks . . . . . . . . . . . . . . . . 14
2.4.2 Language Model Pre-training . . . . . . . . . . . . . . . . . . 15
2.4.3 Mathematical Reasoning with Neural Networks . . . . . . . . 17
Chapter 3 Hierarchical end-to-end architecture of Transformer encoders for solving visual IQ tests 19
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.2 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Perception Module: Object Detection Model . . . . . . . . . 24
3.2.2 Reasoning Module: Hierarchical Transformer Encoder . . . . 26
3.2.3 Contrasting Module and Loss function . . . . . . . . . . . . . 29
3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Results for Perception Module . . . . . . . . . . . . . . . . . 35
3.3.4 Results for Reasoning Module . . . . . . . . . . . . . . . . . . 36
3.3.5 Ablation studies . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 4 Domain-slot relationship modeling using Transformers
for dialogue state tracking 40
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Domain-Slot-Context Encoder . . . . . . . . . . . . . . . . . 44
4.2.2 Slot-gate classifier . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.3 Slot-value classifier . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.4 Total objective function . . . . . . . . . . . . . . . . . . . . . 50
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.3 Results for the MultiWOZ-2.1 dataset . . . . . . . . . . . . . 52
4.3.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 5 Pre-training of Transformers with Question-Answer Masked
Language Modeling for Mathematical Question Answering 62
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Pre-training: Question-Answer Masked Language Modeling . 65
5.2.2 Fine-tuning: Mathematical Question Answering . . . . . . . . 67
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.3 Experimental Results on the Mathematics dataset . . . . . . 71
5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Chapter 6 Conclusion 79
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Bibliography 83
κ΅λ¬Έμ΄λ‘ 101
κ°μ¬μ κΈ 103Docto