1 research outputs found

    트랜슀포머λ₯Ό ν†΅ν•œ λ³΅μž‘ν•œ μΆ”λ‘  λŠ₯λ ₯ 정볡을 μœ„ν•œ 연ꡬ: μ‹œκ°μ , λŒ€ν™”μ , μˆ˜ν•™μ  μΆ”λ‘ μ—μ˜ 적용

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 산업곡학과, 2021. 2. μ‘°μ„±μ€€.As deep learning models advanced, research is focusing on sophisticated tasks that require complex reasoning, rather than simple classification tasks. These complex tasks require multiple reasoning steps that resembles human intelligence. Architecture-wise, recurrent neural networks and convolutional neural networks have long been the main stream model for deep learning. However, both models suffer from shortcomings from their innate architecture. Nowadays, the attention-based Transformer is replacing them due to its superior architecture and performance. Particularly, the encoder of the Transformer has been extensively studied in the field of natural language processing. However, for the Transformer to be effective in data with distinct structures and characteristics, appropriate adjustments to its structure is required. In this dissertation, we propose novel architectures based on the Transformer encoder for various supervised learning tasks with different data types and characteristics. The tasks that we consider are visual IQ tests, dialogue state tracking and mathematical question answering. For the visual IQ test, the input is in a visual format with hierarchy. To deal with this, we propose using a hierarchical Transformer encoder with structured representation that employs a novel neural network architecture to improve both perception and reasoning. The hierarchical structure of the Transformer encoders and the architecture of each individual Transformer encoder all fit to the characteristics of the data of visual IQ tests. For dialogue state tracking, value prediction for multiple domain-slot pairs is required. To address this issue, we propose a dialogue state tracking model using a pre-trained language model, which is a pre-trained Transformer encoder, for domain-slot relationship modeling. We introduced special tokens for each domain-slot pair which enables effective dependency modeling among domain-slot pairs through the pre-trained language encoder. Finally, for mathematical question answering, we propose a method to pre-train a Transformer encoder on a mathematical question answering dataset for improved performance. Our pre-training method, Question-Answer Masked Language Modeling, utilizes both the question and answer text, which is suitable for the mathematical question answering dataset. Through experiments, we show that each of our proposed methods is effective in their corresponding task and data type.μˆœν™˜ 신경망과 ν•©μ„±κ³± 신경망은 μ˜€λž«λ™μ•ˆ λ”₯λŸ¬λ‹ λΆ„μ•Όμ—μ„œ μ£Όμš” λͺ¨λΈλ‘œ μ“°μ—¬μ™”λ‹€. ν•˜μ§€λ§Œ 두 λͺ¨λΈ λͺ¨λ‘ 자체적인 κ΅¬μ‘°μ—μ„œ μ˜€λŠ” ν•œκ³„λ₯Ό 가진닀. μ΅œκ·Όμ—λŠ” μ–΄ν…μ…˜(attention)에 κΈ°λ°˜ν•œ 트랜슀포머(Transformer)κ°€ 더 λ‚˜μ€ μ„±λŠ₯κ³Ό ꡬ쑰둜 μΈν•΄μ„œ 이듀을 λŒ€μ²΄ν•΄ λ‚˜κ°€κ³  μžˆλ‹€. 트랜슀포머 인코더(Transformer encoder)λŠ” μžμ—°μ–΄ 처리 λΆ„μ•Όμ—μ„œ νŠΉλ³„νžˆ 더 λ§Žμ€ 연ꡬ가 이루어지고 μžˆλ‹€. ν•˜μ§€λ§Œ Transformerκ°€ νŠΉλ³„ν•œ ꡬ쑰와 νŠΉμ§•μ„ 가진 데이터에 λŒ€ν•΄μ„œλ„ μ œλŒ€λ‘œ μž‘λ™ν•˜κΈ° μœ„ν•΄μ„œλŠ” κ·Έ ꡬ쑰에 μ μ ˆν•œ λ³€ν™”κ°€ μš”κ΅¬λœλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λ‹€μ–‘ν•œ 데이터 μ’…λ₯˜μ™€ νŠΉμ„±μ— λŒ€ν•œ ꡐ사 ν•™μŠ΅μ— μ μš©ν•  수 μžˆλŠ” 트랜슀포머 인코더에 κΈ°λ°˜ν•œ μƒˆλ‘œμš΄ ꡬ쑰의 λͺ¨λΈλ“€μ„ μ œμ•ˆν•œλ‹€. 이번 μ—°κ΅¬μ—μ„œ λ‹€λ£¨λŠ” 과업은 μ‹œκ° IQ ν…ŒμŠ€νŠΈ, λŒ€ν™” μƒνƒœ νŠΈλž˜ν‚Ή 그리고 μˆ˜ν•™ 질의 응닡이닀. μ‹œκ° IQ ν…ŒμŠ€νŠΈμ˜ μž…λ ₯ λ³€μˆ˜λŠ” μœ„κ³„λ₯Ό 가진 μ‹œκ°μ μΈ ν˜•νƒœμ΄λ‹€. 이에 λŒ€μ‘ν•˜κΈ° μœ„ν•΄μ„œ μš°λ¦¬λŠ” 인지와 사고 μΈ‘λ©΄μ—μ„œ μ„±λŠ₯을 ν–₯상 μ‹œν‚¬ 수 μžˆλŠ” μƒˆλ‘œμš΄ λ‰΄λŸ΄ λ„€νŠΈμ›Œν¬ ꡬ쑰인, κ΅¬μ‘°ν™”λœ ν‘œν˜„ν˜•μ„ μ²˜λ¦¬ν•  수 μžˆλŠ” 계측적인 트랜슀포머 인코더 λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. 트랜슀 포머 μΈμ½”λ”μ˜ 계측적 ꡬ쑰와 각각의 트랜슀포머 μΈμ½”λ”μ˜ ꡬ쑰 λͺ¨λ‘κ°€ μ‹œκ° IQ ν…ŒμŠ€νŠΈ λ°μ΄ν„°μ˜ νŠΉμ§•μ— μ ν•©ν•˜λ‹€. λŒ€ν™” μƒνƒœ νŠΈλž˜ν‚Ήμ€ μ—¬λŸ¬ 개의 도메인-슬둯(domain-slot)μŒμ— λŒ€ν•œ κ°’(value)이 μš”κ΅¬λœλ‹€. 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄μ„œ μš°λ¦¬λŠ” 사전 ν•™μŠ΅λœ 트랜슀포머 인코더인, 사전 ν•™μŠ΅ μ–Έμ–΄ λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ 도메인-슬둯의 관계λ₯Ό λͺ¨λΈλ§ν•˜λŠ” 것을 μ œμ•ˆν•œλ‹€. 각 도메인-슬둯 μŒμ— λŒ€ν•œ 특수 토큰을 λ„μž…ν•¨μœΌλ‘œμ¨ 효과적으둜 도메인-슬둯 μŒλ“€ κ°„μ˜ 관계λ₯Ό λͺ¨λΈλ§ ν•  수 μžˆλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μˆ˜ν•™ 질의 응닡을 μœ„ν•΄μ„œλŠ” μˆ˜ν•™ 질의 응닡 데이터에 λŒ€ν•΄μ„œ 사전 ν•™μŠ΅μ„ μ§„ν–‰ν•¨μœΌλ‘œμ¨ μˆ˜ν•™ 질의 응닡 과업에 λŒ€ν•΄μ„œ μ„±λŠ₯을 λ†’νžˆλŠ” 방법을 μ œμ•ˆν•œλ‹€. 우리의 사전 ν•™μŠ΅ 방법인 질의-응닡 λ§ˆμŠ€ν‚Ή μ–Έμ–΄ λͺ¨λΈλ§μ€ μ§ˆμ˜μ™€ 응닡 ν…μŠ€νŠΈ λͺ¨λ‘λ₯Ό ν™œμš© ν•¨μœΌλ‘œμ¨ μˆ˜ν•™ 질의 응닡 데이터에 μ ν•©ν•œ ν˜•νƒœμ΄λ‹€. μ‹€ν—˜μ„ ν†΅ν•΄μ„œ 각각의 μ œμ•ˆλœ 방법둠듀이 ν•΄λ‹Ήν•˜λŠ” κ³Όμ—…κ³Ό 데이터 μ’…λ₯˜μ— λŒ€ν•΄μ„œ 효과적인 것을 λ°ν˜”λ‹€.Abstract i Contents vi List of Tables viii List of Figures xii Chapter 1 Introduction 1 Chapter 2 Literature Review 7 2.1 Related Works on Transformer . . . . . . . . . . . . . . . . . . . . . 7 2.2 Related Works on Visual IQ Tests . . . . . . . . . . . . . . . . . . . 10 2.2.1 RPM-related studies . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Object Detection related studies . . . . . . . . . . . . . . . . 11 2.3 Related works on Dialogue State Tracking . . . . . . . . . . . . . . . 12 2.4 Related Works on Mathematical Question Answering . . . . . . . . . 14 2.4.1 Pre-training of Neural Networks . . . . . . . . . . . . . . . . 14 2.4.2 Language Model Pre-training . . . . . . . . . . . . . . . . . . 15 2.4.3 Mathematical Reasoning with Neural Networks . . . . . . . . 17 Chapter 3 Hierarchical end-to-end architecture of Transformer encoders for solving visual IQ tests 19 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Perception Module: Object Detection Model . . . . . . . . . 24 3.2.2 Reasoning Module: Hierarchical Transformer Encoder . . . . 26 3.2.3 Contrasting Module and Loss function . . . . . . . . . . . . . 29 3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.3 Results for Perception Module . . . . . . . . . . . . . . . . . 35 3.3.4 Results for Reasoning Module . . . . . . . . . . . . . . . . . . 36 3.3.5 Ablation studies . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Domain-slot relationship modeling using Transformers for dialogue state tracking 40 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.1 Domain-Slot-Context Encoder . . . . . . . . . . . . . . . . . 44 4.2.2 Slot-gate classifier . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 Slot-value classifier . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.4 Total objective function . . . . . . . . . . . . . . . . . . . . . 50 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.3 Results for the MultiWOZ-2.1 dataset . . . . . . . . . . . . . 52 4.3.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Chapter 5 Pre-training of Transformers with Question-Answer Masked Language Modeling for Mathematical Question Answering 62 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2.1 Pre-training: Question-Answer Masked Language Modeling . 65 5.2.2 Fine-tuning: Mathematical Question Answering . . . . . . . . 67 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 70 5.3.3 Experimental Results on the Mathematics dataset . . . . . . 71 5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Conclusion 79 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Bibliography 83 ꡭ문초둝 101 κ°μ‚¬μ˜ κΈ€ 103Docto
    corecore