5 research outputs found
ν©μ± λ³λ ¬λ°μ΄ν°λ₯Ό νμ©ν μΈκ³΅μ κ²½λ§ κΈ°κ³λ²μ μμ€ν ꡬμΆ
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2017. 8. μ€μ±λ‘.νμ΅λ λ²μ λͺ¨λΈμ μν΄ μμ± κ°λ₯ν ν©μ± λ³λ ¬λ°μ΄ν°λ μ΅κ·Ό μΈκ³΅μ κ²½λ§ κΈ°κ³λ²μμμ λ°μνλ λ€μν μ΄μμ ν¨κ³Όμ μΈ ν΄κ²°μ±
μΌλ‘ λλλμλ€. μ΄λ¬ν ν©μ± λ³λ ¬λ°μ΄ν°μ ν¨μ©μ μ°©μνμ¬ λ³Έ μ°κ΅¬μμλ ν©μ± λ³λ ¬λ°μ΄ν°λ§μ νμ©νμ¬ μΈκ³΅μ κ²½λ§ κΈ°κ³λ²μ μμ€ν
μ ꡬμΆνλ€. λλΆμ΄ λ³Έ μ°κ΅¬μμλ μ€μ λ³λ ¬ λ°μ΄ν°μ ν¨κ³Όμ μΈ λμμ΄ λ μ μλ μλ‘μ΄ μ νμ ν©μ± λ³λ ¬λ°μ΄ν°λ₯Ό μ μνλ€. λ³Έ μ°κ΅¬μμ μ μνλ ν©μ± λ³λ ¬λ°μ΄ν°λ μ€μ λ¬Έμ₯κ³Ό ν©μ±λ λ¬Έμ₯μ΄ λ³λ ¬ λ¬Έμ₯ μμ μμͺ½μ νΌμ¬λμ΄ μλ€λ μ μμ κΈ°μ‘΄μ μ μλλ ν©μ± λ³λ ¬λ°μ΄ν°μ μ°¨λ³μ±μ κ°λλ€. λμΌν 쑰건μμ λ³Έ μ°κ΅¬κ° μ μνλ ν©μ± λ³λ ¬λ°μ΄ν°λ‘ μΈκ³΅μ κ²½λ§ κΈ°κ³λ²μ μμ€ν
μ νμ΅ν κ²°κ³Ό, κΈ°μ‘΄μ μ μλλ ν©μ± λ³λ ¬λ°μ΄ν°λ‘ νμ΅ν κ²½μ°μ λΉν΄ μλ°©ν₯ λ²μμμ λ³΄λ€ μ°μνκ³ μμ μ μΈ λ²μ μ±λ₯μ λνλλ€. λν μλ‘μ΄ ν©μ± λ³λ ¬λ°μ΄ν°λ‘ νμ΅ν μΈκ³΅μ κ²½λ§ λ²μ λͺ¨λΈμ μ€μ λ³λ ¬λ°μ΄ν°λ‘ fine-tuning ν κ²½μ°, κΈ°μ‘΄μ μ μλ ν©μ± λ³λ ¬λ°μ΄ν°μ λΉν΄ μλμ μΌλ‘ λμ λ²μ μ±λ₯μ ν₯μμ νμΈν μ μμλ€.Recent works have shown that synthetic parallel data automatically generated by translation models can be effective for various neural machine translation (NMT) issues. In this study, we build NMT systems using only synthetic parallel data. We also present a novel synthetic parallel corpus as an efficient alternative to real parallel data. The proposed pseudo parallel data are distinct from those of previous works in that ground truth and synthetic examples are mixed on both sides of sentence pairs. Experiments on Czech-German and French-German translations demonstrate the efficacy of the proposed pseudo parallel corpus in empirical NMT applications, which not only shows enhanced results for bidirectional translation tasks, but also substantial improvement with the aid of a ground truth parallel corpus.Table of Contents
β
. Introduction 1
β
‘. Background: Neural Machine Translation 4
β
’. Related Work 9
β
£. Synthetic Parallel Data as an Alternative to Real Parallel Corpus 11
4.1. Motivation 11
4.2. Limits of the Previous Approaches 11
4.3. Proposed Mixing Approach 14
β
€. Experiments: Effects of Mixing Real and Synthetic Examples 17
5.1. Data Preparation 18
5.2. Data Preprocessing 19
5.3. Training and Evaluation 19
5.4. Results and Analysis 20
5.4.1. A Comparison between Pivot-based Approach and Back-translation 20
5.4.2. Effects of Mixing Source- and Target-originated Synthetic Parallel Data 21
5.4.3. A Comparison with Phrase-based Statistical Machine Translation 23
β
₯. Experiments: Large-scale Application 25
6.1. Application Scenarios 25
6.2. Data Preparation 26
6.3. Training and Evaluation 27
6.4. Results and Analysis 31
6.4.1. A Comparison with Real Parallel Data 31
6.4.2. Results from the Pseudo Only Scenario 31
6.4.3. Results from the Real Fine-tuning Scenario 33
β
¦. Conclusion 35
Bibliography 36
Abstract 43Maste