2 research outputs found
Developing a Korean sentiment lexicon through sentiment score propagation of English sentiment lexicon
μμ¦ μ¬λλ€μ μμ μ κ°μΈμ μΈ κ°μ κ³Ό μ견μ νννκΈ° μν΄ μμ
λ€νΈμν¬ μλΉμ€λ₯Ό μ£Όλ‘ μ΄μ©νλ€. λ°λΌμ μ¬λ‘ μ‘°μ¬λ μμ₯ λν₯ λ±μ νμ
νκΈ° μν΄ κ°μ λΆμμ μν λ°μ΄ν°λ‘ μμ£Ό μ¬μ©λλ€. κ°μ λΆμμ λ¬Έμ λλ λν μμμ μ£Όμ΄μ§ μ£Όμ μ λν νλμ μ견μ μ΄ν΄νλ μλνλ νλ‘μΈμ€μ΄λ€. κ°μ λΆμμ λ€μν μ κ·Όλ² μ€ νλλ κ°μ μ¬μ μ μ΄μ©νλ μ¬μ κΈ°λ° μ κ·Όλ²μ΄λ€. κ·Έλ¬λ μμ
λ€νΈμν¬ μλΉμ€μμμ λ§μ κ²μλ¬Όλ€μλ κ°μ μ¬μ μ μ‘΄μ¬νμ§ μλ λ¨μ΄κ° λ§μ μ¬μ κΈ°λ° λ°©μμΌλ‘ λΆμνκΈ° μ΄λ ΅λ€. λ°λΌμ κ°μ λΆμμ ν¨κ³Όμ μΌλ‘ μννκΈ° μνμ¬, κ°μ μ¬μ μ νμ₯ λλ μλ‘μ΄ κ°μ μ¬μ μ μμ΄ μꡬλλ€.
λ³Έ λ
Όλ¬Έμμλ κ²μ¦λ μμ΄ κ°μ μ¬μ μΈ VADERμ κ°μ μ¬μ μ νμ©νμ¬ νκ΅μ΄ κ°μ μ¬μ μ μλμΌλ‘ μμ±νλ λ°©λ²μ μ μνλ€. μ μνλ λ°©λ²μ μΈ λ¨κ³λ‘ ꡬμ±λλ€. 첫 λ²μ§Έ λ¨κ³λ νμ λ³λ ¬ λ§λμΉλ₯Ό μ¬μ©νμ¬ νμ μ΄μ€μΈμ΄μ¬μ μ μ μνλ€. μ΄μ€μΈμ΄μ¬μ μ VADER κ°μ μ΄μ νκ΅μ΄ ννμ μλ€μ μ§ν©μ΄λ€. λ λ²μ§Έ λ¨κ³λ μ΄μ€μΈμ΄μ¬μ μ μ¬μ©νμ¬ μ΄μ€μΈμ΄κ·Έλνλ₯Ό μμ±νλ€. κ·Έλνμ μ μ μ VADER κ°μ μ΄μ νκ΅μ΄ ννμλ₯Ό μ¬μ©νκ³ , κ°μ μ°κ²°μ μ΄μ€μΈμ΄μ¬μ λ° λμΌ μΈμ΄μ λμμ΄ μμΌλ‘ ꡬμ±λλ€. μΈ λ²μ§Έ λ¨κ³λ μ΄μ€μΈμ΄κ·Έλν μμμ λ μ΄λΈ μ ν μκ³ λ¦¬μ¦μ μ€ννλ€. κ·Έλν μμ λͺ¨λ μ μ λ€μ κ°μ΄ μλ ΄λ λκΉμ§ λ μ΄λΈ μ ν μκ³ λ¦¬μ¦μ λ°λ³΅μ μΌλ‘ μ μ©νμ¬ λμΌλ‘ μλ‘μ΄ κ°μ μ¬μ μ΄ μ μλλ€.
μ μνλ λ°©λ²μΌλ‘ μ μλ κ°μ μ¬μ μ κ²μ¦νκΈ° μνμ¬ μ¬μ κΈ°λ°μ νκ΅μ΄ κ°μ λΆμ μμ€ν
μ ꡬμΆνμλ€. VADER κ°μ λΆμ μμ€ν
μμμ λ°κ²¬λ²μ μ κ·Όμ νκ΅μ΄μ νΉμ±μ λ§μΆ° λ³ννμ¬ μ μ©μμΌ°λ€. νκ° μλ£λ‘λ λ΄μ€ κΈ°μ¬μ λκΈμ λͺ¨μλμ KMU κ°μ λ§λμΉ, μννμ λͺ¨μλμ λ€μ΄λ² κ°μ μν λ§λμΉ λ κ°λ₯Ό μ¬μ©νμλ€. νκ° κ²°κ³Ό, KMU κ°μ λ§λμΉμμλ 81%μ μ νλλ₯Ό 보μμΌλ©° λ€μ΄λ² κ°μ μν λ§λμΉμμλ 72%μ λ₯Ό λ¬μ±νμλ€. μ΄μ κ°μ κ²°κ³Όλ₯Ό ν΅ν΄ μ μνλ λ°©λ²μ΄ μλ‘μ΄ κ°μ μ¬μ μ μκ³Ό κ°μ λΆμμ μμ΄μ ν¨κ³Όμ μμ μ μ μλ€. ν₯νμλ κΈ°κ³νμ΅, μ¬μΈ΅νμ΅μ μ μ©νμ¬ μ°κ΅¬λ₯Ό μ§νν μμ μ΄λ€.|Nowadays, people express their personal feelings and opinions on social media, and such the posts or reviews are frequently used as the data for the sentiment analysis to order to identify public opinions, market trends, and so on. Sentiment analysis is the automated process of understanding an attitudes and opinion about a given topic from written or spoken text. One of the sentiment analysis approaches is a dictionary-based approach, in which a sentiment dictionary plays an important role. However, many posts on the social media cannot be analyzed by dictionary-based approach due to the absence of sentiment words in the dictionary. Therefore the sentiment dictionary should be expanded or built in totally new domains.
In this paper, we propose a method to automatically create a Korean sentiment lexicon from the verified English sentiment lexicon called VADER sentiment lexicon. The proposed method consists of three steps. The first step is to produce a KoreanβEnglish bilingual lexicon using the KoreanβEnglish parallel corpus. The bilingual lexicon is a set of pairs between VADER sentiment words and Korean morphemes. The second step is to generate a bilingual graph using the bilingual lexicon. The vertex on the graph is a word (VADER sentiment words or Korean morphemes), and the edge is a pair of words, which are in the bilingual lexicon or belongs to synonyms for the same language. The third step is to run the label propagation algorithm throughout the bilingual graph. Finally a new Korean sentiment lexicon is created by repeatedly applying the propagation algorithm until the values of all vertices converge.
To validate the sentiment lexicon generated by the proposed method, we made a dictionary-based Korean sentiment classifier with some heuristic rules, which is quite similar to the VADER sentiment classifier in English, but most of its rules have been specially adapted to suit Korean characteristics. The resources used for evaluating the classifier are two Korean sentiment corpus: news article and movie review. The accuracy of 81% and the F-score of 72% for the news article corpus and the movie review corpus are achieved, respectively. Through the evaluation, we have observed that the proposed method is pretty good and very effective. In the future, we will have more experiments for comparing the performance of various approaches like a machine learning-based approach, a deep learning-based approach, and so on.μ 1 μ₯ μ λ‘ 1
μ 2 μ₯ κ΄λ ¨ μ°κ΅¬ 4
2.1 κ°μ λΆμ 4
2.1.1 λ°μ΄ν° μμ§ 4
2.1.2 μ£Όκ΄μ± νμ§ 5
2.1.3 κ·Ήμ± νμ§ 6
2.2 κ°μ μ¬μ 7
2.2.1 μ¬μ κΈ°λ° κ°μ μ¬μ 7
2.2.2 λ§λμΉ κΈ°λ° κ°μ μ¬μ 9
2.2.3 μ§λ¨μ§μ± κΈ°λ° κ°μ μ¬μ 12
2.3 VADER κ°μ μ¬μ 14
μ 3 μ₯ κ°μ μ μ μ νλ₯Ό ν΅ν κ°μ μ¬μ μ μ 18
3.1 νμ μ΄μ€μΈμ΄μ¬μ μ μ 19
3.1.1 νμ λ³λ ¬ λ§λμΉ ν ν°ν 19
3.1.2 μνΈμ 보λ νλ ¬ μ μ 20
3.1.3 μ½μ¬μΈ μ μ¬λλ₯Ό ν΅ν μ΄μ€μΈμ΄μ¬μ μ μ 24
3.2 νκ΅μ΄ fastText νμ λͺ¨λΈ μ μ 26
3.3 νμ μ΄μ€μΈμ΄κ·Έλν μ μ 27
3.4 κ°μ μ μ μ ν 31
μ 4 μ₯ μ€ν λ° νκ° 37
4.1 μ μ κ³Όμ μ λ°κ²¬λ²μ (heuristic) μ κ·Όμ κ²μ¦ 37
4.2 μ μλ κ°μ μ¬μ μ κ²μ¦ 38
4.2.1 κ°μ λΆμ μμ€ν
39
4.2.2 κ°μ λΆμ μμ€ν
μ νμ©ν κ°μ λ§λμΉ κ°μ λΆμ 41
μ 5 μ₯ κ²°λ‘ λ° ν₯ν μ°κ΅¬ 45
μ°Έκ³ λ¬Έν 47
κ°μ¬μ κΈ 55Maste
Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction
A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word association between source words (resp., target words) and pivot words and the other estimates them from two parallel corpora based on word alignment tools for statistical machine translation. Empirical results on two language pairs (e.g., Korean-Spanish and Korean-French) have shown that the pivot-based approach is very promising for resource-poor languages and this approach observes its validity and usability. Furthermore, for words with low frequency, our method is also well performed