22,113 research outputs found

    Automatically detecting open academic review praise and criticism

    Get PDF
    This is an accepted manuscript of an article published by Emerald in Online Information Review on 15 June 2020. The accepted version of the publication may differ from the final published version, accessible at https://doi.org/10.1108/OIR-11-2019-0347.Purpose: Peer reviewer evaluations of academic papers are known to be variable in content and overall judgements but are important academic publishing safeguards. This article introduces a sentiment analysis program, PeerJudge, to detect praise and criticism in peer evaluations. It is designed to support editorial management decisions and reviewers in the scholarly publishing process and for grant funding decision workflows. The initial version of PeerJudge is tailored for reviews from F1000Researchโ€™s open peer review publishing platform. Design/methodology/approach: PeerJudge uses a lexical sentiment analysis approach with a human-coded initial sentiment lexicon and machine learning adjustments and additions. It was built with an F1000Research development corpus and evaluated on a different F1000Research test corpus using reviewer ratings. Findings: PeerJudge can predict F1000Research judgements from negative evaluations in reviewersโ€™ comments more accurately than baseline approaches, although not from positive reviewer comments, which seem to be largely unrelated to reviewer decisions. Within the F1000Research mode of post-publication peer review, the absence of any detected negative comments is a reliable indicator that an article will be โ€˜approvedโ€™, but the presence of moderately negative comments could lead to either an approved or approved with reservations decision. Originality/value: PeerJudge is the first transparent AI approach to peer review sentiment detection. It may be used to identify anomalous reviews with text potentially not matching judgements for individual checks or systematic bias assessments

    Active learning in annotating micro-blogs dealing with e-reputation

    Full text link
    Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

    A study on text-score disagreement in online reviews

    Get PDF
    In this paper, we focus on online reviews and employ artificial intelligence tools, taken from the cognitive computing field, to help understanding the relationships between the textual part of the review and the assigned numerical score. We move from the intuitions that 1) a set of textual reviews expressing different sentiments may feature the same score (and vice-versa); and 2) detecting and analyzing the mismatches between the review content and the actual score may benefit both service providers and consumers, by highlighting specific factors of satisfaction (and dissatisfaction) in texts. To prove the intuitions, we adopt sentiment analysis techniques and we concentrate on hotel reviews, to find polarity mismatches therein. In particular, we first train a text classifier with a set of annotated hotel reviews, taken from the Booking website. Then, we analyze a large dataset, with around 160k hotel reviews collected from Tripadvisor, with the aim of detecting a polarity mismatch, indicating if the textual content of the review is in line, or not, with the associated score. Using well established artificial intelligence techniques and analyzing in depth the reviews featuring a mismatch between the text polarity and the score, we find that -on a scale of five stars- those reviews ranked with middle scores include a mixture of positive and negative aspects. The approach proposed here, beside acting as a polarity detector, provides an effective selection of reviews -on an initial very large dataset- that may allow both consumers and providers to focus directly on the review subset featuring a text/score disagreement, which conveniently convey to the user a summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be published in the Journal of Cognitive Computation, available at Springer via http://dx.doi.org/10.1007/s12559-017-9496-

    The Business Impact of Social Media - Sentiment Analysis Approach -

    Get PDF
    ์ด ์—ฐ๊ตฌ์˜ ๋ชฉ์ ์€ ์†Œ์…œ ๋ฏธ๋””์–ด์—์„œ ์ถ”์ถœ๋œ 7๊ฐœ์˜ ๊ฐ์„ฑ ๋„๋ฉ”์ธ์ด ์ž๋™์ฐจ ์‹œ์žฅ ์ ์œ ์œจ ์˜ˆ์ธก์— ๋Œ€ํ•œ ๊ฐ์„ฑ ๋ถ„์„ ์‹คํ—˜์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋กœ์„œ ์ ํ•ฉํ•œ ์ง€์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ํ™•์ธํ•˜๊ณ  ๊ณ ๊ฐ๋“ค์˜ ์˜๊ฒฌ์ด ๊ธฐ์—…์˜ ์„ฑ๊ณผ์— ์–ด๋–ป๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ง€์— ๋Œ€ํ•˜์—ฌ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด3๋‹จ๊ณ„์— ๊ฑธ์ณ์„œ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๊ฐ์„ฑ์‚ฌ์ „ ๊ตฌ์ถ•์˜ ๋‹จ๊ณ„๋กœ์„œ 2013๋…„ 1์›” 1์ผ๋ถ€ํ„ฐ 2015๋…„ 12์›” 31์ผ๊นŒ์ง€ ๋ฏธ๊ตญ ๋‚ด 26๊ฐœ์˜ ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ์˜ ๊ณ ๊ฐ์˜ ์†Œ๋ฆฌ (VOC: Voice of the Customer) ์ด 45,447๊ฐœ๋ฅผ ์ž๋™์ฐจ ์ปค๋ฎค๋‹ˆํ‹ฐ๋กœ๋ถ€ํ„ฐ ํฌ๋กค๋ง (crawling)ํ•˜์—ฌ POS (Part-of-Speech) ์ฆ‰ ํ’ˆ์‚ฌ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ํƒœ๊น… (tagging)๊ณผ์ •์„ ๊ฑฐ์ณ ๋ถ€์ •์ , ๊ธ์ •์  ๊ฐ์„ฑ์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ธก์ •ํ•˜์—ฌ ๊ฐ์„ฑ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•˜์˜€๊ณ , ์ด์— ๋Œ€ํ•œ ๊ทน์„ฑ์„ ์ธก์ •ํ•˜์—ฌ 7๊ฐœ์˜ ๊ฐ์„ฑ๋„๋ฉ”์ธ์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ ๋ถ„์„์˜ ๋‹จ๊ณ„๋กœ์„œ ์ž๊ธฐ์ƒ๊ด€๊ด€๊ณ„๋ถ„์„ (Auto-correlation Analysis)๊ณผ ์ฃผ์„ฑ๋ถ„๋ถ„์„ (PCA: Principal Component Analysis)์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ์‹คํ—˜์— ์ ํ•ฉํ•œ์ง€๋ฅผ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” 2๊ฐœ์˜ ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ๋กœ 7๊ฐœ์˜ ๊ฐ์„ฑ์˜์—ญ์ด ๋ฏธ๊ตญ๋‚ด ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ ์ค‘ GM, ํฌ๋“œ, FCA, ํญ์Šค๋ฐ”๊ฒ ๋“ฑ ์ด 4๊ฐœ์˜ ์ž๋™์ฐจ ์ƒ์‚ฐ ๊ธฐ์—…์„ ์„ ์ •ํ•˜์—ฌ ์ด๋“ค ๊ธฐ์—…์˜ ์„ฑ๊ณผ ์ฆ‰, ์ž๋™์ฐจ ์‹œ์žฅ์ ์œ ์œจ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ์ง€ ์‹คํ—˜ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์šฐ๋ฆฌ๋Š” 4,815๊ฐœ์˜ ๋ถ€์ •์ ์ธ ์–ดํœ˜๋“ค๊ณผ 2,021๊ฐœ์˜ ๊ธ์ •์ ์ธ ๊ฐ์„ฑ์–ดํœ˜๋“ค์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ์„ฑ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•˜์˜€์œผ๋ฉฐ, ๊ตฌ์ถ•๋œ ๊ฐ์„ฑ์‚ฌ์ „์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ถ”์ถœ๋˜๊ณ  ๋ถ„๋ฅ˜๋œ ๋ถ€์ •์ ์ด๊ณ  ๊ธ์ •์ ์ธ ์–ดํœ˜๋“ค์„ ์ž๋™์ฐจ ์‚ฐ์—…์— ๊ด€๋ จ๋œ ์–ดํœ˜๋“ค๊ณผ ์กฐํ•ฉํ•˜์˜€๊ณ , ์ž๊ธฐ์ƒ๊ด€๋ถ„์„๊ณผ PCA (์ฃผ์„ฑ๋ถ„ ๋ถ„์„)๋ฅผ ํ†ตํ•ด ๊ฐ์„ฑ์˜ ํŠน์„ฑ์„ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด, ์ž๊ธฐ์ƒ๊ด€๋ถ„์„์— ์˜ํ•ด์„œ ๊ฐ์„ฑ ๋ฐ์ดํ„ฐ์— ์–ด๋–ค ์ผ์ •ํ•œ ํŒจํ„ด์ด ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ๊ณ , ๊ฐ๊ฐ์˜ ๊ฐ์„ฑ ์˜์—ญ์˜ ๊ฐ์„ฑ์ด ์ž๊ธฐ์ƒ๊ด€์„ฑ์ด ์žˆ์œผ๋ฉฐ, ๊ฐ์„ฑ์˜ ์‹œ๊ณ„์—ด์„ฑ ๋˜ํ•œ ๊ด€์ฐฐ๋˜์—ˆ๋‹ค. PCA์— ์˜ํ•œ ๊ฒฐ๊ณผ๋กœ์„œ, 7๊ฐœ ๊ฐ์„ฑ์˜์—ญ์ด ๋ถ€์ •์„ฑ, ๊ธ์ •์„ฑ, ์ค‘๋ฆฝ์„ฑ์„ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ž๊ธฐ์ƒ๊ด€๋ถ„์„๊ณผ PCA๋ฅผ ํ†ตํ•œ VOC ๊ฐ์„ฑ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ๋ฐ”ํƒ•์œผ๋กœ 2๊ฐœ์˜ ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์—์„œ ๋ถ€์ •์  ๊ฐ์„ฑ์˜ Sadness, Anger, Fear์™€ ๊ธ์ •์  ๊ฐ์„ฑ๋„๋ฉ”์ธ์ธ Delight, Satisfaction์„ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ์„ ์ •ํ•˜๊ณ , ์‹œ์žฅ์ ์œ ์œจ์„ ์ข…์†๋ณ€์ˆ˜๋กœ ์„ ์ •ํ•˜์—ฌ ์‹คํ–‰ํ•˜์˜€๊ณ  ๋‘ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์— ์ฃผ์„ฑ๋ถ„์ด ์ค‘๋ฆฝ์„ฑ์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ Shame, Frustration์„ ๋…๋ฆฝ๋ณ€์ˆ˜์— ์ถ”๊ฐ€ํ•˜์—ฌ ์ค‘๋ฆฝ์„ฑ์„ ๋ ๊ณ  ์žˆ๋Š” ๊ฐ์„ฑ์ด ์‹œ์žฅ ์ ์œ ์œจ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ์ง€๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ, ๊ฐ ๊ธฐ์—… ๋งˆ๋‹ค ์‹œ์žฅ์ ์œ ์œจ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฐ์„ฑ๋“ค์ด ์กด์žฌํ•˜๊ณ  ๋ชจ๋ธ 1๊ณผ, ๋ชจ๋ธ 2์—์„œ์˜ ๊ฐ์„ฑ ์˜ํ–ฅ๋ ฅ์ด ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด, ๋ฐ์ดํ„ฐ ์ƒ์— ๋‚˜ํƒ€๋‚œ ์ •๋ณด๋ฅผ ๊ฐ€์ง„ ๊ฐ์„ฑ์ด ๊ณผ๊ฑฐ ๊ฐ’์— ๊ธฐ์ดˆํ•˜์—ฌ ์ž๋™์ฐจ ์‹œ์žฅ์—์„œ ๋ณ€ํ™”๋ฅผ ์ˆ˜๋ฐ˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๊ฐ€ ์‹œ์žฅ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์šฉ์„ฑ์„ ์ ์šฉํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ์ž๋™์ฐจ ์‹œ์žฅ ๊ด€๋ จ ์ •๋ณด๋‚˜ ๊ฐ์„ฑ์˜ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ์ž˜ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๊ฐ์ • ๋ถ„์„์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์— ํฐ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์‹ค์ œ ์‹œ์žฅ์—์„œ์˜ ๋น„์ง€๋‹ˆ์Šค ์„ฑ๊ณผ์—๋„ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.List of Tables iv List of Figures v Abstract 1 1. Introduction 1.1 Back Ground 3 1.2 Necessity of Study 6 1.3 Purpose & Questions 8 1.4 Structure 9 2. Literature Reviews of VOC Analysis 2.1 Importance of VOC 11 2.2 Data Mining 15 2.2.1 Concept & Functionalities 15 2.2.2 Methodologies of Data mining 20 2.3 Text Mining 24 2.4 Sentiment Analysis 26 2.5 Research Trend in Korea 30 3. Methodology 3.1 Research Flow 32 3.2 Proposed Methodologies 34 3.2.1 Sentiment Analysis 34 3.2.2 Auto-correlation Analysis 37 3.2.3 Principal Component Analysis (PCA) 38 3.2.4 Linear Regression 40 4. Experiment & Analysis 4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43 4.1.1 The Subject of Analysis & Crawling Data 43 4.1.2 Extracting POS Information 44 4.1.3 Review Extracting POS Information 46 4.2 Phase II : Reliability Analysis 49 4.2.1 Auto-correlation Analysis of Sentiment 51 4.2.2 Principal Component Analysis of Sentiment 55 4.3 Phase III : Influence on Automotive Market Share 58 4.3.1 Linear Regression Model 58 4.3.2 Definition of Variables 60 4.3.3 The Result of Linear Regression Analysis 62 5. Conclusion 5.1 Summary of Study 73 5.2 Managerial Implication and Limitation 75 5.3 Future Study 77 References 79Docto
    • โ€ฆ
    corecore