22,113 research outputs found
Automatically detecting open academic review praise and criticism
This is an accepted manuscript of an article published by Emerald in Online Information Review on 15 June 2020.
The accepted version of the publication may differ from the final published version, accessible at https://doi.org/10.1108/OIR-11-2019-0347.Purpose: Peer reviewer evaluations of academic papers are known to be variable in content and overall judgements but are important academic publishing safeguards. This article introduces a sentiment analysis program, PeerJudge, to detect praise and criticism in peer evaluations. It is designed to support editorial management decisions and reviewers in the scholarly publishing process and for grant funding decision workflows. The initial version of PeerJudge is tailored for reviews from F1000Researchโs open peer review publishing platform.
Design/methodology/approach: PeerJudge uses a lexical sentiment analysis approach with a human-coded initial sentiment lexicon and machine learning adjustments and additions. It was built with an F1000Research development corpus and evaluated on a different F1000Research test corpus using reviewer ratings.
Findings: PeerJudge can predict F1000Research judgements from negative evaluations in reviewersโ comments more accurately than baseline approaches, although not from positive reviewer comments, which seem to be largely unrelated to reviewer decisions. Within the F1000Research mode of post-publication peer review, the absence of any detected negative comments is a reliable indicator that an article will be โapprovedโ, but the presence of moderately negative comments could lead to either an approved or approved with reservations decision.
Originality/value: PeerJudge is the first transparent AI approach to peer review sentiment detection. It may be used to identify anomalous reviews with text potentially not matching judgements for individual checks or systematic bias assessments
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
A study on text-score disagreement in online reviews
In this paper, we focus on online reviews and employ artificial intelligence
tools, taken from the cognitive computing field, to help understanding the
relationships between the textual part of the review and the assigned numerical
score. We move from the intuitions that 1) a set of textual reviews expressing
different sentiments may feature the same score (and vice-versa); and 2)
detecting and analyzing the mismatches between the review content and the
actual score may benefit both service providers and consumers, by highlighting
specific factors of satisfaction (and dissatisfaction) in texts.
To prove the intuitions, we adopt sentiment analysis techniques and we
concentrate on hotel reviews, to find polarity mismatches therein. In
particular, we first train a text classifier with a set of annotated hotel
reviews, taken from the Booking website. Then, we analyze a large dataset, with
around 160k hotel reviews collected from Tripadvisor, with the aim of detecting
a polarity mismatch, indicating if the textual content of the review is in
line, or not, with the associated score.
Using well established artificial intelligence techniques and analyzing in
depth the reviews featuring a mismatch between the text polarity and the score,
we find that -on a scale of five stars- those reviews ranked with middle scores
include a mixture of positive and negative aspects.
The approach proposed here, beside acting as a polarity detector, provides an
effective selection of reviews -on an initial very large dataset- that may
allow both consumers and providers to focus directly on the review subset
featuring a text/score disagreement, which conveniently convey to the user a
summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be
published in the Journal of Cognitive Computation, available at Springer via
http://dx.doi.org/10.1007/s12559-017-9496-
The Business Impact of Social Media - Sentiment Analysis Approach -
์ด ์ฐ๊ตฌ์ ๋ชฉ์ ์ ์์
๋ฏธ๋์ด์์ ์ถ์ถ๋ 7๊ฐ์ ๊ฐ์ฑ ๋๋ฉ์ธ์ด ์๋์ฐจ ์์ฅ ์ ์ ์จ ์์ธก์ ๋ํ ๊ฐ์ฑ ๋ถ์ ์คํ์ ์ํ ๋ฐ์ดํฐ๋ก์ ์ ํฉํ ์ง์ ๋ํ ์ ๋ขฐ์ฑ์ ํ์ธํ๊ณ ๊ณ ๊ฐ๋ค์ ์๊ฒฌ์ด ๊ธฐ์
์ ์ฑ๊ณผ์ ์ด๋ป๊ฒ ์ํฅ์ ๋ฏธ์น๋ ์ง์ ๋ํ์ฌ ํ์ธํ๊ธฐ ์ํ ๊ฒ์ด๋ค. ๋ณธ ์ฐ๊ตฌ๋ ์ด3๋จ๊ณ์ ๊ฑธ์ณ์ ์งํ๋์์ต๋๋ค. ์ฒซ ๋ฒ์งธ ๋จ๊ณ๋ ๊ฐ์ฑ์ฌ์ ๊ตฌ์ถ์ ๋จ๊ณ๋ก์ 2013๋
1์ 1์ผ๋ถํฐ 2015๋
12์ 31์ผ๊น์ง ๋ฏธ๊ตญ ๋ด 26๊ฐ์ ์๋์ฐจ ์ ์กฐ ํ์ฌ์ ๊ณ ๊ฐ์ ์๋ฆฌ (VOC: Voice of the Customer) ์ด 45,447๊ฐ๋ฅผ ์๋์ฐจ ์ปค๋ฎค๋ํฐ๋ก๋ถํฐ ํฌ๋กค๋ง (crawling)ํ์ฌ POS (Part-of-Speech) ์ฆ ํ์ฌ์ ๋ณด๋ฅผ ์ถ์ถํ๋ ํ๊น
(tagging)๊ณผ์ ์ ๊ฑฐ์ณ ๋ถ์ ์ , ๊ธ์ ์ ๊ฐ์ฑ์ ๋น๋์๋ฅผ ์ธก์ ํ์ฌ ๊ฐ์ฑ์ฌ์ ์ ๊ตฌ์ถํ์๊ณ , ์ด์ ๋ํ ๊ทน์ฑ์ ์ธก์ ํ์ฌ 7๊ฐ์ ๊ฐ์ฑ๋๋ฉ์ธ์ ๋ง๋ค์์ต๋๋ค. ๋ ๋ฒ์งธ ๋จ๊ณ๋ ๋ฐ์ดํฐ์ ๋ํ ์ ๋ขฐ์ฑ ๋ถ์์ ๋จ๊ณ๋ก์ ์๊ธฐ์๊ด๊ด๊ณ๋ถ์ (Auto-correlation Analysis)๊ณผ ์ฃผ์ฑ๋ถ๋ถ์ (PCA: Principal Component Analysis)์ ํตํด ๋ฐ์ดํฐ๊ฐ ์คํ์ ์ ํฉํ์ง๋ฅผ ๊ฒ์ฆํ์๋ค. ์ธ ๋ฒ์งธ ๋จ๊ณ์์๋ 2๊ฐ์ ์ ํํ๊ท๋ถ์ ๋ชจ๋ธ๋ก 7๊ฐ์ ๊ฐ์ฑ์์ญ์ด ๋ฏธ๊ตญ๋ด ์๋์ฐจ ์ ์กฐ ํ์ฌ ์ค GM, ํฌ๋, FCA, ํญ์ค๋ฐ๊ฒ ๋ฑ ์ด 4๊ฐ์ ์๋์ฐจ ์์ฐ ๊ธฐ์
์ ์ ์ ํ์ฌ ์ด๋ค ๊ธฐ์
์ ์ฑ๊ณผ ์ฆ, ์๋์ฐจ ์์ฅ์ ์ ์จ์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๊ณ ์๋ ์ง ์คํํ์๋ค. ๊ทธ ๊ฒฐ๊ณผ, ์ฐ๋ฆฌ๋ 4,815๊ฐ์ ๋ถ์ ์ ์ธ ์ดํ๋ค๊ณผ 2,021๊ฐ์ ๊ธ์ ์ ์ธ ๊ฐ์ฑ์ดํ๋ค์ ์ถ์ถํ์ฌ ๊ฐ์ฑ์ฌ์ ์ ๊ตฌ์ถํ์์ผ๋ฉฐ, ๊ตฌ์ถ๋ ๊ฐ์ฑ์ฌ์ ์ ๋ฐํ์ผ๋ก, ์ถ์ถ๋๊ณ ๋ถ๋ฅ๋ ๋ถ์ ์ ์ด๊ณ ๊ธ์ ์ ์ธ ์ดํ๋ค์ ์๋์ฐจ ์ฐ์
์ ๊ด๋ จ๋ ์ดํ๋ค๊ณผ ์กฐํฉํ์๊ณ , ์๊ธฐ์๊ด๋ถ์๊ณผ PCA (์ฃผ์ฑ๋ถ ๋ถ์)๋ฅผ ํตํด ๊ฐ์ฑ์ ํน์ฑ์ ์กฐ์ฌํ์๋ค. ์คํ ๊ฒฐ๊ณผ์ ๋ฐ๋ฅด๋ฉด, ์๊ธฐ์๊ด๋ถ์์ ์ํด์ ๊ฐ์ฑ ๋ฐ์ดํฐ์ ์ด๋ค ์ผ์ ํ ํจํด์ด ์กด์ฌํ๋ค๋ ๊ฒ์ด ๋ฐ๊ฒฌ๋์๊ณ , ๊ฐ๊ฐ์ ๊ฐ์ฑ ์์ญ์ ๊ฐ์ฑ์ด ์๊ธฐ์๊ด์ฑ์ด ์์ผ๋ฉฐ, ๊ฐ์ฑ์ ์๊ณ์ด์ฑ ๋ํ ๊ด์ฐฐ๋์๋ค. PCA์ ์ํ ๊ฒฐ๊ณผ๋ก์, 7๊ฐ ๊ฐ์ฑ์์ญ์ด ๋ถ์ ์ฑ, ๊ธ์ ์ฑ, ์ค๋ฆฝ์ฑ์ ์ฃผ์ฑ๋ถ์ผ๋ก ์ฐ๊ฒฐ๋์ด ์์์ ํ์ธํ ์ ์์๋ค. ์๊ธฐ์๊ด๋ถ์๊ณผ PCA๋ฅผ ํตํ VOC ๊ฐ์ฑ ๋ฐ์ดํฐ์ ๋ํ ์ ๋ขฐ์ฑ์ ๋ฐํ์ผ๋ก 2๊ฐ์ ์ ํํ๊ท๋ถ์ ๋ชจ๋ธ์ ๊ตฌ์ถํ์ฌ ์คํ์ ์งํํ์๋ค. ์ฒซ ๋ฒ์งธ ๋ชจ๋ธ์ ์ฃผ์ฑ๋ถ ๋ถ์์์ ๋ถ์ ์ ๊ฐ์ฑ์ Sadness, Anger, Fear์ ๊ธ์ ์ ๊ฐ์ฑ๋๋ฉ์ธ์ธ Delight, Satisfaction์ ๋
๋ฆฝ๋ณ์๋ก ์ ์ ํ๊ณ , ์์ฅ์ ์ ์จ์ ์ข
์๋ณ์๋ก ์ ์ ํ์ฌ ์คํํ์๊ณ ๋ ๋ฒ์งธ ๋ชจ๋ธ์ ์ฒซ ๋ฒ์งธ ๋ชจ๋ธ์ ์ฃผ์ฑ๋ถ์ด ์ค๋ฆฝ์ฑ์ผ๋ก ๊ฒฐ๊ณผ๊ฐ ๋์จ Shame, Frustration์ ๋
๋ฆฝ๋ณ์์ ์ถ๊ฐํ์ฌ ์ค๋ฆฝ์ฑ์ ๋ ๊ณ ์๋ ๊ฐ์ฑ์ด ์์ฅ ์ ์ ์จ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์น๊ณ ์๋ ์ง๋ฅผ ํ์ธํ์๋ค. ๋ถ์ ๊ฒฐ๊ณผ, ๊ฐ ๊ธฐ์
๋ง๋ค ์์ฅ์ ์ ์จ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์น๋ ๊ฐ์ฑ๋ค์ด ์กด์ฌํ๊ณ ๋ชจ๋ธ 1๊ณผ, ๋ชจ๋ธ 2์์์ ๊ฐ์ฑ ์ํฅ๋ ฅ์ด ์ฐจ์ด๊ฐ ์์์ ๋ฐ๊ฒฌํ์๋ค. ๋ณธ ์ฐ๊ตฌ๋ฅผ ํตํด, ๋ฐ์ดํฐ ์์ ๋ํ๋ ์ ๋ณด๋ฅผ ๊ฐ์ง ๊ฐ์ฑ์ด ๊ณผ๊ฑฐ ๊ฐ์ ๊ธฐ์ดํ์ฌ ์๋์ฐจ ์์ฅ์์ ๋ณํ๋ฅผ ์๋ฐํ ์ ์๋ค๋ ๊ฒ์ ๋ํ๋ด๊ณ ์์์ ํ์ธํ์๋ค. ๋ํ, ์ฐ๋ฆฌ๊ฐ ์์ฅ ๋ฐ์ดํฐ์ ๊ฐ์ฉ์ฑ์ ์ ์ฉํ๋ ค๊ณ ํ ๋, ์๋์ฐจ ์์ฅ ๊ด๋ จ ์ ๋ณด๋ ๊ฐ์ฑ์ ์๊ธฐ์๊ด์ฑ์ ์ ํ์ฉํ ์ ์๋ค๋ฉด, ๊ฐ์ ๋ถ์์ ๋ํ ์ฐ๊ตฌ์ ํฐ ๊ธฐ์ฌ๋ฅผ ํ ์ ์์ ๋ฟ๋ง ์๋๋ผ, ์ค์ ์์ฅ์์์ ๋น์ง๋์ค ์ฑ๊ณผ์๋ ๋ค์ํ ๋ฐฉ๋ฒ์ผ๋ก ๊ธฐ์ฌํ ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.List of Tables iv
List of Figures v
Abstract 1
1. Introduction
1.1 Back Ground 3
1.2 Necessity of Study 6
1.3 Purpose & Questions 8
1.4 Structure 9
2. Literature Reviews of VOC Analysis
2.1 Importance of VOC 11
2.2 Data Mining 15
2.2.1 Concept & Functionalities 15
2.2.2 Methodologies of Data mining 20
2.3 Text Mining 24
2.4 Sentiment Analysis 26
2.5 Research Trend in Korea 30
3. Methodology
3.1 Research Flow 32
3.2 Proposed Methodologies 34
3.2.1 Sentiment Analysis 34
3.2.2 Auto-correlation Analysis 37
3.2.3 Principal Component Analysis (PCA) 38
3.2.4 Linear Regression 40
4. Experiment & Analysis
4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43
4.1.1 The Subject of Analysis & Crawling Data 43
4.1.2 Extracting POS Information 44
4.1.3 Review Extracting POS Information 46
4.2 Phase II : Reliability Analysis 49
4.2.1 Auto-correlation Analysis of Sentiment 51
4.2.2 Principal Component Analysis of Sentiment 55
4.3 Phase III : Influence on Automotive Market Share 58
4.3.1 Linear Regression Model 58
4.3.2 Definition of Variables 60
4.3.3 The Result of Linear Regression Analysis 62
5. Conclusion
5.1 Summary of Study 73
5.2 Managerial Implication and Limitation 75
5.3 Future Study 77
References 79Docto
- โฆ