장르 특정적 담화 유형 기반의 온라인 리뷰의 감정분석

Abstract

학위논문 (석사)-- 서울대학교 대학원 : 언어학과 언어학전공, 2015. 8. 신효필.Though in recent years sentiment analysis has evolved from simple lexicon-based and statistical models to methods involving discourse information, the major problem with the current approaches is that they use the same set of features for sentiment classification of texts of all genres and types (tweets, editorials, discussion board posts, online reviews etc.). Moreover, features that were used by previous researchers reflect only one aspect of discourse, namely, coherence, and they are limited to explicit ways of ensuring coherence, such as conjunctions. To be more specific, these are such features as implicit coherence, realized through adjacency of two sentences, continuity, which shows that two sentences have the same sentiment and is commonly reflected through the use of such conjunctions as and or moreover, and contrast, which is indicated by such conjunctions as but and shows the shift of the opinions polarity. In this study we propose a new set of features which reflects the specific traits of a particular genre ? online reviews: implicit contrast, realized through usage of such limiting expressions as the only drawbackbackground patterns, which are expressions that help to establish a review authors identityand involvement features, which are used to interact with the reader. To show the effectiveness of these features, we annotated a corpus of 120 product reviews and represented each review as a set of non-discourse, generic and genre-specific discourse features extracted from it (together with the target label from the annotation). Such feature sets were used in two series of experiments: fine-grained and coarse grained. At the sentence level we conducted the experiments with and without lexical features, while at the document level we performed 5-, 3- and 2-class classification. Our experiments showed that genre-specific features in general perform better than the generic ones, ensuring greater improvements in precision and recall. If generic features led to minor increases or even deteriorated the performance (as in case of implicit coherence), genre-specific features (especially background) were more stable and allowed us to achieve better recall and precision across all experiments. These tendencies were especially remarkable in the fine-grained classification with lexical features, where adding generic discourse features to the lexical ones deteriorated the results. Moreover, the performance of genre-specific features is not only statistically reliable but also reflects the theoretical properties of online reviews discourse outlined in our study.1. Introduction 1 1.1 Subject Matter 1 1.2 Purposes of the Study 3 1.3 Contributions of the Study 4 1.4 Structure of the Study 5 2. Previous Studies 7 2.1 Previous Studies on Sentiment Analysis of Online Reviews 7 2.2 Previous Studies on Discourse in Sentiment Analysis 9 3. Generic and Genre-specific Discourse Features for Sentiment Analysis 12 3.1 Theoretical Background 12 3.2 Discourse in Rhetorical Structure Theory 15 3.3 Discourse in Sociolinguistics 18 4. Data and Features 20 4.1 Data and Annotation 20 4.1.1 Corpus 20 4.1.2 Annotation Guidelines and Results 21 4.2 Features Used for Experiments 25 4.2.1 Non-discourse Features 25 4.2.1.1 Lexical Features 26 4.2.1.2 Global Polarity Features 27 4.2.2 Generic Discourse Features 28 4.2.2.1 Implicit Coherence 28 4.2.2.2 Continuity 29 4.2.2.3 Explicit Contrast 33 4.2.3 Discourse Features Specific to Online Reviews 36 4.2.3.1 Implicit Contrast 36 4.2.3.2 Background Features 39 4.2.3.3 Involvement Features 44 4.3 Feature Validation 45 5. Predicting Sentence Polarity Using Discourse Features 48 5.1 Experiment Setup 48 5.2 Evaluation of Experiments 50 5.2.1 Measures 50 5.2.2 Results 51 5.2.2.1 Preliminary Classification 51 5.2.2.2 Classification with Lexical Features 52 5.2.2.3 Classification without Lexical Features 56 5.3 Discussion 58 6. Predicting Review Ratings Using Discourse Features 61 6.1 Experiment Setup 61 6.2 Experiment Results 63 6.2.1 5-class Classification 63 6.2.2 Comparison of Results of 2, 3 and 5-class Classification 65 6.3 Discussion 66 7. Conclusion and Future Prospects 68 References 70Maste

    Similar works

    Full text

    thumbnail-image