3 research outputs found
Bilingual Lexicon Extraction Using a Modified Perceptron Algorithm
μ μ° μΈμ΄ν λΆμΌμμ λ³λ ¬ λ§λμΉμ μ΄μ€μΈμ΄ μ΄νλ κΈ°κ³λ²μκ³Ό κ΅μ°¨ μ 보 νμ λ±μ λΆμΌμμ μ€μν μμμΌλ‘ μ¬μ©λκ³ μλ€. μλ₯Ό λ€μ΄, λ³λ ¬ λ§λμΉλ κΈ°κ³λ²μ μμ€ν
μμ λ²μ νλ₯ λ€μ μΆμΆνλλ° μ¬μ©λλ€. μ΄μ€μΈμ΄ μ΄νλ κ΅μ°¨ μ 보 νμμμ μ§μ μ μΌλ‘ λ¨μ΄ λ λ¨μ΄ λ²μμ κ°λ₯νκ² νλ€. λν κΈ°κ³λ²μ μμ€ν
μμ λ²μ νλ‘μΈμ€λ₯Ό λμμ£Όλ μν μ νκ³ μλ€. κ·Έλ¦¬κ³ νμ΅μ μν λ³λ ¬ λ§λμΉμ μ΄μ€μΈμ΄ μ΄νμ μ©λμ΄ ν¬λ©΄ ν΄μλ‘ κΈ°κ³λ²μ μμ€ν
μ μ±λ₯μ΄ ν₯μλλ€. κ·Έλ¬λ μ΄λ¬ν μ΄μ€μΈμ΄ μ΄νλ₯Ό μλμΌλ‘, μ¦ μ¬λμ νμΌλ‘ ꡬμΆνλ κ²μ λ§μ λΉμ©κ³Ό μκ°κ³Ό λ
Έλμ νμλ‘ νλ€. μ΄λ¬ν μ΄μ λ€ λλ¬Έμ μ΄μ€μΈμ΄ μ΄νλ₯Ό μΆμΆνλ μ°κ΅¬κ° λ§μ μ°κ΅¬μλ€μκ² κ°κ΄λ°κ² λμλ€.
λ³Έ λ
Όλ¬Έμμλ μ΄μ€μΈμ΄ μ΄νλ₯Ό μΆμΆνλ μλ‘κ³ ν¨κ³Όμ μΈ λ°©λ²λ‘ μ μ μνλ€. μ΄μ€μΈμ΄ μ΄ν μΆμΆμμ κ°μ₯ λ§μ΄ λ€λ£¨μ΄μ§λ λ²‘ν° κ³΅κ° λͺ¨λΈμ κΈ°λ°μΌλ‘ νκ³ , μ κ²½λ§μ ν μ’
λ₯μΈ νΌμ
νΈλ‘ μκ³ λ¦¬μ¦μ μ¬μ©νμ¬ μ΄μ€μΈμ΄ μ΄νμ κ°μ€μΉλ₯Ό λ°λ³΅ν΄μ νμ΅νλ€. κ·Έλ¦¬κ³ λ°λ³΅μ μΌλ‘ νμ΅λ μ΄μ€μΈμ΄ μ΄νμ κ°μ€μΉμ νΌμ
νΈλ‘ μ μ¬μ©νμ¬ μ΅μ’
μ΄μ€μΈμ΄ μ΄νλ€μ μΆμΆνλ€.
κ·Έ κ²°κ³Ό, νμ΅λμ§ μμ μ΄κΈ°μ κ²°κ³Όμ λΉν΄μ λ°λ³΅ νμ΅λ κ²°κ³Όκ° νκ· 3.5%μ μ νλ ν₯μμ μ»μ μ μμλ€1. Introduction
2. Literature Review
2.1 Linguistic resources: The text corpora
2.2 A vector space model
2.3 Neural networks: The single layer Perceptron
2.4 Evaluation metrics
3. System Architecture of Bilingual Lexicon Extraction System
3.1 Required linguistic resources
3.2 System architecture
4. Building a Seed Dictionary
4.1 Methodology: Context Based Approach (CBA)
4.2 Experiments and results
4.2.1 Experimental setups
4.2.2 Experimental results
4.3 Discussions
5. Extracting Bilingual Lexicons
4.1 Methodology: Iterative Approach (IA)
4.2 Experiments and results
4.2.1 Experimental setups
4.2.2 Experimental results
4.3 Discussions
6. Conclusions and Future Work
μΆμ λ¬Έμ μμ½μ μν κ²μ΄νΈλ ν©μ±κ³± μ κ²½λ§κ³Ό κΉμ μΈ΅ μ΅ν©
DoctorText summarization is one of the central tasks in Natural Language Processing. Recent advances in deep neural networks and representation learning have substantially improved text summarization technology. There are largely two approaches to text summarization: extractive and abstractive. The extractive approach generate a summary by extracting salient linguistic constitutes from the document and assembling them to make grammatical sentences. In contrast, the abstractive approach write summaries using words that may or may not exist in the document using sophisticated techniques such as meaning representation, content organization and surface realization.
In this thesis, we focus on abstractive summarization, and propose a model to represent and recognize salient content better from a document that is one of the major abilities to better text summarization. Furthermore, we introduce a large-scale Korean dataset for document summarization.
First of all, we adopt a hierarchical structure to capture various ranges of the representation. Moreover, we propose a gating mechanism to make better intermediate representations and we utilize POS (Part-of-Speech) tags to use morphological and syntactic features. Lastly, we propose a simple and efficient deep layer fusion to extract and merge salient information from the encoder layers. We evaluate our model using ROUGE metrics on three different datasets: CNN-DM, NEWSROOM-ABS, and XSUM. Experimental results show that the proposed model outperforms the state-of-the-art abstractive models on NEWSROOM-ABS and XSUM and shows comparable scores on CNN-DM.
These data-driven approaches require a large amount of data for model training. However, large-scale datasets do not exist for less well-known languages such as Korean, and building such a dataset is very labor-intensive and time-consuming. In this thesis, we propose Korean summarization datasets that are acquired automatically by leveraging the characteristics of news articles. The dataset consists of 206,822 article-summary pairs in which summaries are written in headline-style with multiple sentences. With analysis of our dataset and experimental results, we showed that the proposed dataset is being fairly large to train an abstractive summarization model, comparable to existing English news datasets and suitable for develop abstractive summarization models
Adoption of a Neural Language Model in an Encoder for Encoder-Decoder based Korean Grammatical Error Correction
λ¬Έλ² μ€λ₯ κ΅μ μ μ£Όμ΄μ§ λ¬Έμ₯μμ λνλ λ¬Έλ²μ μΈ μ€λ₯λ€μ νμ§νκ³ μ΄λ₯Ό μ¬λ°λ₯΄κ² κ΅μ νλ κ²μΌλ‘, νΉμ μΈμ΄λ₯Ό λ°°μ°κ³ μ νλ L2 νμ΅μλ€μ λκ±°λ μμ€ν
μ μλͺ»λ μ
μΆλ ₯ μμ λ± λ€μν μμ© λΆμΌμ νμ© κ°λ₯νλ€. λ³Έ λ
Όλ¬Έμμλ νκ΅μ΄ λ¬Έλ² μ€λ₯ κ΅μ νμ΅μ νμμ μΈ κ΅μ λ³λ ¬ λ°μ΄ν°κ° λΆμ‘±ν λ¬Έμ λ₯Ό 보μνκΈ° μνμ¬ λ¨μΌ λ§λμΉλ₯Ό νμ©νλ κΈ°λ²μ μ μνλ€. λ¨μΌ λ§λμΉλ‘ νμ΅μν¨ μ κ²½λ§μΈμ΄ λͺ¨λΈμ Encoderμ λμ
νμ¬, μ κ²½λ§ κΈ°κ³ λ²μ κΈ°λ° κ΅μ λͺ¨λΈμ΄ μ¬λ°λ₯΄κ² μ¬μ©λ μμ κ³Ό λ¬Έλ²μ μΌλ‘ μλͺ» μ¬μ©λ μμ μ λ³΄λ€ λͺ
ννκ² κ΅¬λΆν μ μκ² νλ€. μ΄λ₯Ό ν λλ‘, μ¬λ°λ₯΄κ² μ¬μ©λ μμ μ 볡μ¬λμ μ¦κ°μν€λ©΄μ κΈ°μ‘΄ Encoder-Decoder λͺ¨λΈμ μλͺ»λ κ΅μ μ λ°©μ§νλ κ²μ νμΈν μ μμλ€.22Nkc