767 research outputs found
A study of sentiment analysis on customer reviews
The way people shop has changed thanks to the internet and lots of e-commerce like Amazon, Etsy, and Best Buy. In the past, people went to the store and examined products there. Now, people decide on purchasing a product according to its rating and reviews. Sometimes, there is an unfair relationship between a customer's rating and comment. For a book, for instance, although the review is 'the book is so boring and long,' the customer gives a high rating mistakenly or for a specific reason. To reduce this inanity as much as possible and provide a better shopping experience to customers, we should focus on people's thoughts which cannot be done by mistake. In this paper, a sentiment analysis, which examines the opinion or feeling expression, whether positive, negative, or natural, is applied to customer reviews. The reviews are collected by Amazon between 2008 and 2020 in seven different categories for a specific product. The data sets include the product id, name, date, rating, helpfulness, and target. The rating, review, and target would be enough for analysis. The target column represents a positive or negative label based on the ratings, and the reviews are text-based data that is needed to apply preprocessing techniques like whitespace, punctuation, and special character removal. After preprocessing steps, VADER (Valence Aware Dictionary for Sentiment Reasoning) and Textblob, which are lexicon-based sentiment analyzers, are used for properly labeling comments as positive or negative. Since the data sets have more positive-labeled reviews than negative, an oversampling method is applied to balance the dataset. For the feature extraction, the Count Vectorizer and TF- IDF (term frequency-inverse document frequency) are used to create training and test data. Several machine learning algorithms (Logistic Regression, Linear Support Vector Machine, Naive Bayes, Decision Tree, and K-Nearest Neighbors) are used to compare the models and reach the best result.Includes bibliographical references
SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods
In the last few years thousands of scientific papers have investigated
sentiment analysis, several startups that measure opinions on real data have
emerged and a number of innovative products related to this theme have been
developed. There are multiple methods for measuring sentiments, including
lexical-based and supervised machine learning methods. Despite the vast
interest on the theme and wide popularity of some methods, it is unclear which
one is better for identifying the polarity (i.e., positive or negative) of a
message. Accordingly, there is a strong need to conduct a thorough
apple-to-apple comparison of sentiment analysis methods, \textit{as they are
used in practice}, across multiple datasets originated from different data
sources. Such a comparison is key for understanding the potential limitations,
advantages, and disadvantages of popular methods. This article aims at filling
this gap by presenting a benchmark comparison of twenty-four popular sentiment
analysis methods (which we call the state-of-the-practice methods). Our
evaluation is based on a benchmark of eighteen labeled datasets, covering
messages posted on social networks, movie and product reviews, as well as
opinions and comments in news articles. Our results highlight the extent to
which the prediction performance of these methods varies considerably across
datasets. Aiming at boosting the development of this research area, we open the
methods' codes and datasets used in this article, deploying them in a benchmark
system, which provides an open API for accessing and comparing sentence-level
sentiment analysis methods
Towards the development of an explainable e-commerce fake review index: An attribute analytics approach
Instruments of corporate risk and reputation assessment tools are quintessentially developed on structured quantitative data linked to financial ratios and macroeconomics. An emerging stream of studies has challenged this norm by demonstrating improved risk assessment and model prediction capabilities through unstructured textual corporate data. Fake online consumer reviews pose serious threats to a business’ competitiveness and sales performance, directly impacting revenue, market share, brand reputation and even survivability. Research has shown that as little as three negative reviews can lead to a potential loss of 59.2 % of customers. Amazon, as the largest e-commerce retail platform, hosts over 85,000 small-to-medium-size (SME) retailers (UK), selling over fifty percent of Amazon products worldwide. Despite Amazon's best efforts, fake reviews are a growing problem causing financial and reputational damage at a scale never seen before. While large corporations are better equipped to handle these problems more efficiently, SMEs become the biggest victims of these scam tactics. Following the principles of attribute (AA) and responsible (RA) analytics, we present a novel hybrid method for indexing enterprise risk that we call the Fake Review Index (). The proposed modular approach benefits from a combination of structured review metadata and semantic topic index derived from unstructured product reviews. We further apply LIME to develop a Confidence Score, demonstrating the importance of explainability and openness in contemporary analytics within the OR domain. Transparency, explainability and simplicity of our roadmap to a hybrid modular approach offers an attractive entry platform for practitioners and managers from the industry
Item Recommendation with Evolving User Preferences and Experience
Current recommender systems exploit user and item similarities by
collaborative filtering. Some advanced methods also consider the temporal
evolution of item ratings as a global background process. However, all prior
methods disregard the individual evolution of a user's experience level and how
this is expressed in the user's writing in a review community. In this paper,
we model the joint evolution of user experience, interest in specific item
facets, writing style, and rating behavior. This way we can generate individual
recommendations that take into account the user's maturity level (e.g.,
recommending art movies rather than blockbusters for a cinematography expert).
As only item ratings and review texts are observables, we capture the user's
experience and interests in a latent model learned from her reviews, vocabulary
and writing style. We develop a generative HMM-LDA model to trace user
evolution, where the Hidden Markov Model (HMM) traces her latent experience
progressing over time -- with solely user reviews and ratings as observables
over time. The facets of a user's interest are drawn from a Latent Dirichlet
Allocation (LDA) model derived from her reviews, as a function of her (again
latent) experience level. In experiments with five real-world datasets, we show
that our model improves the rating prediction over state-of-the-art baselines,
by a substantial margin. We also show, in a use-case study, that our model
performs well in the assessment of user experience levels
Review Manipulation: Literature Review, and Future Research Agenda
Background: The phenomenon of review manipulation and fake reviews has gained Information Systems (IS) scholars’ attention during recent years. Scholarly research in this domain has delved into the causes and consequences of review manipulation. However, we find that the findings are diverse, and the studies do not portray a systematic approach. This study synthesizes the findings from a multidisciplinary perspective and presents an integrated framework to understand the mechanism of review manipulation.
Method: The study reviews 88 relevant articles on review manipulation spanning a decade and a half. We adopted an iterative coding approach to synthesizing the literature on concepts and categorized them independently into potential themes.
Results: We present an integrated framework that shows the linkages between the different themes, namely, the prevalence of manipulation, impact of manipulation, conditions and choice for manipulation decision, characteristics of fake reviews, models for detecting spam reviews, and strategies to deal with manipulation. We also present the characteristics of review manipulation and cover both operational and conceptual issues associated with the research on this topic.
Conclusions: Insights from the study will guide future research on review manipulation and fake reviews. The study presents a holistic view of the phenomenon of review manipulation. It informs various online platforms to address fake reviews towards building a healthy and sustainable environment
Investigating Cross-Domain Behaviors of BERT in Review Understanding
Review score prediction requires review text understanding, a critical
real-world application of natural language processing. Due to dissimilar text
domains in product reviews, a common practice is fine-tuning BERT models upon
reviews of differing domains. However, there has not yet been an empirical
study of cross-domain behaviors of BERT models in the various tasks of product
review understanding. In this project, we investigate text classification BERT
models fine-tuned on single-domain and multi-domain Amazon review data. In our
findings, though single-domain models achieved marginally improved performance
on their corresponding domain compared to multi-domain models, multi-domain
models outperformed single-domain models when evaluated on multi-domain data,
single-domain data the single-domain model was not fine-tuned on, and on
average when considering all tests. Though slight increases in accuracy can be
achieved through single-domain model fine-tuning, computational resources and
costs can be reduced by utilizing multi-domain models that perform well across
domains.Comment: 9 pages, 1 figure, 2 table
- …