5 research outputs found

    ν•©μ„± 병렬데이터λ₯Ό ν™œμš©ν•œ 인곡신경망 κΈ°κ³„λ²ˆμ—­ μ‹œμŠ€ν…œ ꡬ좕

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2017. 8. μœ€μ„±λ‘œ.ν•™μŠ΅λœ λ²ˆμ—­ λͺ¨λΈμ— μ˜ν•΄ 생성 κ°€λŠ₯ν•œ ν•©μ„± λ³‘λ ¬λ°μ΄ν„°λŠ” 졜근 인곡신경망 κΈ°κ³„λ²ˆμ—­μ—μ„œ λ°œμƒν•˜λŠ” λ‹€μ–‘ν•œ μ΄μŠˆμ— 효과적인 ν•΄κ²°μ±…μœΌλ‘œ λŒ€λ‘λ˜μ—ˆλ‹€. μ΄λŸ¬ν•œ ν•©μ„± λ³‘λ ¬λ°μ΄ν„°μ˜ νš¨μš©μ— μ°©μ•ˆν•˜μ—¬ λ³Έ μ—°κ΅¬μ—μ„œλŠ” ν•©μ„± λ³‘λ ¬λ°μ΄ν„°λ§Œμ„ ν™œμš©ν•˜μ—¬ 인곡신경망 κΈ°κ³„λ²ˆμ—­ μ‹œμŠ€ν…œμ„ κ΅¬μΆ•ν•œλ‹€. λ”λΆˆμ–΄ λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ‹€μ œ 병렬 λ°μ΄ν„°μ˜ 효과적인 λŒ€μ•ˆμ΄ 될 수 μžˆλŠ” μƒˆλ‘œμš΄ μœ ν˜•μ˜ ν•©μ„± 병렬데이터λ₯Ό μ œμ‹œν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œ μ œμ•ˆν•˜λŠ” ν•©μ„± λ³‘λ ¬λ°μ΄ν„°λŠ” μ‹€μ œ λ¬Έμž₯κ³Ό ν•©μ„±λœ λ¬Έμž₯이 병렬 λ¬Έμž₯ 쌍의 μ–‘μͺ½μ— ν˜Όμž¬λ˜μ–΄ μžˆλ‹€λŠ” μ μ—μ„œ 기쑴에 μ œμ‹œλλ˜ ν•©μ„± 병렬데이터와 차별성을 κ°–λŠ”λ‹€. λ™μΌν•œ μ‘°κ±΄μ—μ„œ λ³Έ 연ꡬ가 μ œμ•ˆν•˜λŠ” ν•©μ„± λ³‘λ ¬λ°μ΄ν„°λ‘œ 인곡신경망 κΈ°κ³„λ²ˆμ—­ μ‹œμŠ€ν…œμ„ ν•™μŠ΅ν•œ κ²°κ³Ό, 기쑴에 μ œμ‹œλλ˜ ν•©μ„± λ³‘λ ¬λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ κ²½μš°μ— λΉ„ν•΄ μ–‘λ°©ν–₯ λ²ˆμ—­μ—μ„œ 보닀 μš°μˆ˜ν•˜κ³  μ•ˆμ •μ μΈ λ²ˆμ—­ μ„±λŠ₯을 λ‚˜νƒ€λƒˆλ‹€. λ˜ν•œ μƒˆλ‘œμš΄ ν•©μ„± λ³‘λ ¬λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ 인곡신경망 λ²ˆμ—­ λͺ¨λΈμ„ μ‹€μ œ λ³‘λ ¬λ°μ΄ν„°λ‘œ fine-tuning ν•  경우, 기쑴에 μ œμ‹œλœ ν•©μ„± 병렬데이터에 λΉ„ν•΄ μƒλŒ€μ μœΌλ‘œ 높은 λ²ˆμ—­ μ„±λŠ₯의 ν–₯상을 확인할 수 μžˆμ—ˆλ‹€.Recent works have shown that synthetic parallel data automatically generated by translation models can be effective for various neural machine translation (NMT) issues. In this study, we build NMT systems using only synthetic parallel data. We also present a novel synthetic parallel corpus as an efficient alternative to real parallel data. The proposed pseudo parallel data are distinct from those of previous works in that ground truth and synthetic examples are mixed on both sides of sentence pairs. Experiments on Czech-German and French-German translations demonstrate the efficacy of the proposed pseudo parallel corpus in empirical NMT applications, which not only shows enhanced results for bidirectional translation tasks, but also substantial improvement with the aid of a ground truth parallel corpus.Table of Contents β… . Introduction 1 β…‘. Background: Neural Machine Translation 4 β…’. Related Work 9 β…£. Synthetic Parallel Data as an Alternative to Real Parallel Corpus 11 4.1. Motivation 11 4.2. Limits of the Previous Approaches 11 4.3. Proposed Mixing Approach 14 β…€. Experiments: Effects of Mixing Real and Synthetic Examples 17 5.1. Data Preparation 18 5.2. Data Preprocessing 19 5.3. Training and Evaluation 19 5.4. Results and Analysis 20 5.4.1. A Comparison between Pivot-based Approach and Back-translation 20 5.4.2. Effects of Mixing Source- and Target-originated Synthetic Parallel Data 21 5.4.3. A Comparison with Phrase-based Statistical Machine Translation 23 β…₯. Experiments: Large-scale Application 25 6.1. Application Scenarios 25 6.2. Data Preparation 26 6.3. Training and Evaluation 27 6.4. Results and Analysis 31 6.4.1. A Comparison with Real Parallel Data 31 6.4.2. Results from the Pseudo Only Scenario 31 6.4.3. Results from the Real Fine-tuning Scenario 33 β…¦. Conclusion 35 Bibliography 36 Abstract 43Maste
    corecore