767 research outputs found

    A study of sentiment analysis on customer reviews

    Get PDF
    The way people shop has changed thanks to the internet and lots of e-commerce like Amazon, Etsy, and Best Buy. In the past, people went to the store and examined products there. Now, people decide on purchasing a product according to its rating and reviews. Sometimes, there is an unfair relationship between a customer's rating and comment. For a book, for instance, although the review is 'the book is so boring and long,' the customer gives a high rating mistakenly or for a specific reason. To reduce this inanity as much as possible and provide a better shopping experience to customers, we should focus on people's thoughts which cannot be done by mistake. In this paper, a sentiment analysis, which examines the opinion or feeling expression, whether positive, negative, or natural, is applied to customer reviews. The reviews are collected by Amazon between 2008 and 2020 in seven different categories for a specific product. The data sets include the product id, name, date, rating, helpfulness, and target. The rating, review, and target would be enough for analysis. The target column represents a positive or negative label based on the ratings, and the reviews are text-based data that is needed to apply preprocessing techniques like whitespace, punctuation, and special character removal. After preprocessing steps, VADER (Valence Aware Dictionary for Sentiment Reasoning) and Textblob, which are lexicon-based sentiment analyzers, are used for properly labeling comments as positive or negative. Since the data sets have more positive-labeled reviews than negative, an oversampling method is applied to balance the dataset. For the feature extraction, the Count Vectorizer and TF- IDF (term frequency-inverse document frequency) are used to create training and test data. Several machine learning algorithms (Logistic Regression, Linear Support Vector Machine, Naive Bayes, Decision Tree, and K-Nearest Neighbors) are used to compare the models and reach the best result.Includes bibliographical references

    SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

    Get PDF
    In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which one is better for identifying the polarity (i.e., positive or negative) of a message. Accordingly, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, \textit{as they are used in practice}, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This article aims at filling this gap by presenting a benchmark comparison of twenty-four popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of eighteen labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies considerably across datasets. Aiming at boosting the development of this research area, we open the methods' codes and datasets used in this article, deploying them in a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods

    Towards the development of an explainable e-commerce fake review index: An attribute analytics approach

    Get PDF
    Instruments of corporate risk and reputation assessment tools are quintessentially developed on structured quantitative data linked to financial ratios and macroeconomics. An emerging stream of studies has challenged this norm by demonstrating improved risk assessment and model prediction capabilities through unstructured textual corporate data. Fake online consumer reviews pose serious threats to a business’ competitiveness and sales performance, directly impacting revenue, market share, brand reputation and even survivability. Research has shown that as little as three negative reviews can lead to a potential loss of 59.2 % of customers. Amazon, as the largest e-commerce retail platform, hosts over 85,000 small-to-medium-size (SME) retailers (UK), selling over fifty percent of Amazon products worldwide. Despite Amazon's best efforts, fake reviews are a growing problem causing financial and reputational damage at a scale never seen before. While large corporations are better equipped to handle these problems more efficiently, SMEs become the biggest victims of these scam tactics. Following the principles of attribute (AA) and responsible (RA) analytics, we present a novel hybrid method for indexing enterprise risk that we call the Fake Review Index (). The proposed modular approach benefits from a combination of structured review metadata and semantic topic index derived from unstructured product reviews. We further apply LIME to develop a Confidence Score, demonstrating the importance of explainability and openness in contemporary analytics within the OR domain. Transparency, explainability and simplicity of our roadmap to a hybrid modular approach offers an attractive entry platform for practitioners and managers from the industry

    Item Recommendation with Evolving User Preferences and Experience

    Full text link
    Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels

    Review Manipulation: Literature Review, and Future Research Agenda

    Get PDF
    Background: The phenomenon of review manipulation and fake reviews has gained Information Systems (IS) scholars’ attention during recent years. Scholarly research in this domain has delved into the causes and consequences of review manipulation. However, we find that the findings are diverse, and the studies do not portray a systematic approach. This study synthesizes the findings from a multidisciplinary perspective and presents an integrated framework to understand the mechanism of review manipulation. Method: The study reviews 88 relevant articles on review manipulation spanning a decade and a half. We adopted an iterative coding approach to synthesizing the literature on concepts and categorized them independently into potential themes. Results: We present an integrated framework that shows the linkages between the different themes, namely, the prevalence of manipulation, impact of manipulation, conditions and choice for manipulation decision, characteristics of fake reviews, models for detecting spam reviews, and strategies to deal with manipulation. We also present the characteristics of review manipulation and cover both operational and conceptual issues associated with the research on this topic. Conclusions: Insights from the study will guide future research on review manipulation and fake reviews. The study presents a holistic view of the phenomenon of review manipulation. It informs various online platforms to address fake reviews towards building a healthy and sustainable environment

    Investigating Cross-Domain Behaviors of BERT in Review Understanding

    Full text link
    Review score prediction requires review text understanding, a critical real-world application of natural language processing. Due to dissimilar text domains in product reviews, a common practice is fine-tuning BERT models upon reviews of differing domains. However, there has not yet been an empirical study of cross-domain behaviors of BERT models in the various tasks of product review understanding. In this project, we investigate text classification BERT models fine-tuned on single-domain and multi-domain Amazon review data. In our findings, though single-domain models achieved marginally improved performance on their corresponding domain compared to multi-domain models, multi-domain models outperformed single-domain models when evaluated on multi-domain data, single-domain data the single-domain model was not fine-tuned on, and on average when considering all tests. Though slight increases in accuracy can be achieved through single-domain model fine-tuning, computational resources and costs can be reduced by utilizing multi-domain models that perform well across domains.Comment: 9 pages, 1 figure, 2 table
    • …
    corecore