2 research outputs found

    Developing a Korean sentiment lexicon through sentiment score propagation of English sentiment lexicon

    Get PDF
    μš”μ¦˜ μ‚¬λžŒλ“€μ€ μžμ‹ μ˜ 개인적인 감정과 μ˜κ²¬μ„ ν‘œν˜„ν•˜κΈ° μœ„ν•΄ μ†Œμ…œ λ„€νŠΈμ›Œν¬ μ„œλΉ„μŠ€λ₯Ό 주둜 μ΄μš©ν•œλ‹€. λ”°λΌμ„œ μ—¬λ‘  μ‘°μ‚¬λ‚˜ μ‹œμž₯ 동ν–₯ 등을 νŒŒμ•…ν•˜κΈ° μœ„ν•΄ 감정뢄석을 μœ„ν•œ λ°μ΄ν„°λ‘œ 자주 μ‚¬μš©λœλ‹€. 감정뢄석은 λ¬Έμ„œ λ˜λŠ” λŒ€ν™” μƒμ—μ„œ 주어진 μ£Όμ œμ— λŒ€ν•œ νƒœλ„μ™€ μ˜κ²¬μ„ μ΄ν•΄ν•˜λŠ” μžλ™ν™”λœ ν”„λ‘œμ„ΈμŠ€μ΄λ‹€. κ°μ •λΆ„μ„μ˜ λ‹€μ–‘ν•œ 접근법 쀑 ν•˜λ‚˜λŠ” 감정사전을 μ΄μš©ν•˜λŠ” μ‚¬μ „κΈ°λ°˜ 접근법이닀. κ·ΈλŸ¬λ‚˜ μ†Œμ…œ λ„€νŠΈμ›Œν¬ μ„œλΉ„μŠ€μ—μ„œμ˜ λ§Žμ€ κ²Œμ‹œλ¬Όλ“€μ—λŠ” 감정사전에 μ‘΄μž¬ν•˜μ§€ μ•ŠλŠ” 단어가 λ§Žμ•„ μ‚¬μ „κΈ°λ°˜ λ°©μ‹μœΌλ‘œ λΆ„μ„ν•˜κΈ° μ–΄λ ΅λ‹€. λ”°λΌμ„œ 감정뢄석을 효과적으둜 μˆ˜ν–‰ν•˜κΈ° μœ„ν•˜μ—¬, κ°μ •μ‚¬μ „μ˜ ν™•μž₯ λ˜λŠ” μƒˆλ‘œμš΄ 감정사전 μ œμž‘μ΄ μš”κ΅¬λœλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” κ²€μ¦λœ μ˜μ–΄ 감정사전인 VADER의 감정사전을 ν™œμš©ν•˜μ—¬ ν•œκ΅­μ–΄ 감정사전을 μžλ™μœΌλ‘œ μƒμ„±ν•˜λŠ” 방법을 μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 방법은 μ„Έ λ‹¨κ³„λ‘œ κ΅¬μ„±λœλ‹€. 첫 번째 λ‹¨κ³„λŠ” ν•œμ˜ 병렬 λ§λ­‰μΉ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ ν•œμ˜ 이쀑언어사전을 μ œμž‘ν•œλ‹€. 이쀑언어사전은 VADER 감정어와 ν•œκ΅­μ–΄ ν˜•νƒœμ†Œ μŒλ“€μ˜ 집합이닀. 두 번째 λ‹¨κ³„λŠ” 이쀑언어사전을 μ‚¬μš©ν•˜μ—¬ μ΄μ€‘μ–Έμ–΄κ·Έλž˜ν”„λ₯Ό μƒμ„±ν•œλ‹€. κ·Έλž˜ν”„μ˜ 정점은 VADER 감정어와 ν•œκ΅­μ–΄ ν˜•νƒœμ†Œλ₯Ό μ‚¬μš©ν•˜κ³ , κ°„μ„  연결은 이쀑언어사전 및 동일 μ–Έμ–΄μ˜ λ™μ˜μ–΄ 순으둜 κ΅¬μ„±λœλ‹€. μ„Έ 번째 λ‹¨κ³„λŠ” μ΄μ€‘μ–Έμ–΄κ·Έλž˜ν”„ μƒμ—μ„œ λ ˆμ΄λΈ” μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜μ„ μ‹€ν–‰ν•œλ‹€. κ·Έλž˜ν”„ μƒμ˜ λͺ¨λ“  μ •μ λ“€μ˜ 값이 수렴될 λ•ŒκΉŒμ§€ λ ˆμ΄λΈ” μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜μ„ 반볡적으둜 μ μš©ν•˜μ—¬ 끝으둜 μƒˆλ‘œμš΄ 감정사전이 μ œμž‘λœλ‹€. μ œμ•ˆν•˜λŠ” λ°©λ²•μœΌλ‘œ μ œμž‘λœ 감정사전을 κ²€μ¦ν•˜κΈ° μœ„ν•˜μ—¬ μ‚¬μ „κΈ°λ°˜μ˜ ν•œκ΅­μ–΄ 감정뢄석 μ‹œμŠ€ν…œμ„ κ΅¬μΆ•ν•˜μ˜€λ‹€. VADER 감정뢄석 μ‹œμŠ€ν…œμ—μ„œμ˜ λ°œκ²¬λ²•μ  접근을 ν•œκ΅­μ–΄μ˜ νŠΉμ„±μ— 맞좰 λ³€ν™”ν•˜μ—¬ μ μš©μ‹œμΌ°λ‹€. 평가 μžλ£Œλ‘œλŠ” λ‰΄μŠ€ κΈ°μ‚¬μ˜ λŒ“κΈ€μ„ λͺ¨μ•„놓은 KMU 감정 λ§λ­‰μΉ˜, μ˜ν™”ν‰μ„ λͺ¨μ•„놓은 넀이버 감정 μ˜ν™” λ§λ­‰μΉ˜ 두 개λ₯Ό μ‚¬μš©ν•˜μ˜€λ‹€. 평가 κ²°κ³Ό, KMU 감정 λ§λ­‰μΉ˜μ—μ„œλŠ” 81%의 정확도λ₯Ό λ³΄μ˜€μœΌλ©° 넀이버 감정 μ˜ν™” λ§λ­‰μΉ˜μ—μ„œλŠ” 72%의 λ₯Ό λ‹¬μ„±ν•˜μ˜€λ‹€. 이와 같은 κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•˜λŠ” 방법이 μƒˆλ‘œμš΄ 감정사전 μ œμž‘κ³Ό 감정뢄석에 μžˆμ–΄μ„œ νš¨κ³Όμ μž„μ„ μ•Œ 수 μžˆλ‹€. ν–₯ν›„μ—λŠ” κΈ°κ³„ν•™μŠ΅, μ‹¬μΈ΅ν•™μŠ΅μ„ μ μš©ν•˜μ—¬ 연ꡬλ₯Ό 진행할 μ˜ˆμ •μ΄λ‹€.|Nowadays, people express their personal feelings and opinions on social media, and such the posts or reviews are frequently used as the data for the sentiment analysis to order to identify public opinions, market trends, and so on. Sentiment analysis is the automated process of understanding an attitudes and opinion about a given topic from written or spoken text. One of the sentiment analysis approaches is a dictionary-based approach, in which a sentiment dictionary plays an important role. However, many posts on the social media cannot be analyzed by dictionary-based approach due to the absence of sentiment words in the dictionary. Therefore the sentiment dictionary should be expanded or built in totally new domains. In this paper, we propose a method to automatically create a Korean sentiment lexicon from the verified English sentiment lexicon called VADER sentiment lexicon. The proposed method consists of three steps. The first step is to produce a Korean–English bilingual lexicon using the Korean–English parallel corpus. The bilingual lexicon is a set of pairs between VADER sentiment words and Korean morphemes. The second step is to generate a bilingual graph using the bilingual lexicon. The vertex on the graph is a word (VADER sentiment words or Korean morphemes), and the edge is a pair of words, which are in the bilingual lexicon or belongs to synonyms for the same language. The third step is to run the label propagation algorithm throughout the bilingual graph. Finally a new Korean sentiment lexicon is created by repeatedly applying the propagation algorithm until the values of all vertices converge. To validate the sentiment lexicon generated by the proposed method, we made a dictionary-based Korean sentiment classifier with some heuristic rules, which is quite similar to the VADER sentiment classifier in English, but most of its rules have been specially adapted to suit Korean characteristics. The resources used for evaluating the classifier are two Korean sentiment corpus: news article and movie review. The accuracy of 81% and the F-score of 72% for the news article corpus and the movie review corpus are achieved, respectively. Through the evaluation, we have observed that the proposed method is pretty good and very effective. In the future, we will have more experiments for comparing the performance of various approaches like a machine learning-based approach, a deep learning-based approach, and so on.제 1 μž₯ μ„œ λ‘  1 제 2 μž₯ κ΄€λ ¨ 연ꡬ 4 2.1 감정뢄석 4 2.1.1 데이터 μˆ˜μ§‘ 4 2.1.2 μ£Όκ΄€μ„± 탐지 5 2.1.3 κ·Ήμ„± 탐지 6 2.2 감정사전 7 2.2.1 사전 기반 감정사전 7 2.2.2 λ§λ­‰μΉ˜ 기반 감정사전 9 2.2.3 집단지성 기반 감정사전 12 2.3 VADER 감정사전 14 제 3 μž₯ 감정 점수 μ „νŒŒλ₯Ό ν†΅ν•œ 감정사전 μ œμž‘ 18 3.1 ν•œμ˜ 이쀑언어사전 μ œμž‘ 19 3.1.1 ν•œμ˜ 병렬 λ§λ­‰μΉ˜ 토큰화 19 3.1.2 μƒν˜Έμ •λ³΄λŸ‰ ν–‰λ ¬ μ œμž‘ 20 3.1.3 코사인 μœ μ‚¬λ„λ₯Ό ν†΅ν•œ 이쀑언어사전 μ œμž‘ 24 3.2 ν•œκ΅­μ–΄ fastText ν‘œμƒ λͺ¨λΈ μ œμž‘ 26 3.3 ν•œμ˜ μ΄μ€‘μ–Έμ–΄κ·Έλž˜ν”„ μ œμž‘ 27 3.4 감정 점수 μ „νŒŒ 31 제 4 μž₯ μ‹€ν—˜ 및 평가 37 4.1 μ œμž‘ κ³Όμ •μ˜ λ°œκ²¬λ²•μ (heuristic) μ ‘κ·Όμ˜ 검증 37 4.2 μ œμž‘λœ κ°μ •μ‚¬μ „μ˜ 검증 38 4.2.1 감정뢄석 μ‹œμŠ€ν…œ 39 4.2.2 감정뢄석 μ‹œμŠ€ν…œμ„ ν™œμš©ν•œ 감정 λ§λ­‰μΉ˜ 감정뢄석 41 제 5 μž₯ κ²°λ‘  및 ν–₯ν›„ 연ꡬ 45 μ°Έκ³ λ¬Έν—Œ 47 κ°μ‚¬μ˜ κΈ€ 55Maste

    Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

    No full text
    A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word association between source words (resp., target words) and pivot words and the other estimates them from two parallel corpora based on word alignment tools for statistical machine translation. Empirical results on two language pairs (e.g., Korean-Spanish and Korean-French) have shown that the pivot-based approach is very promising for resource-poor languages and this approach observes its validity and usability. Furthermore, for words with low frequency, our method is also well performed
    corecore