11,981 research outputs found

    Data properties and the performance of sentiment classification for electronic commerce applications

    Get PDF
    Sentiment classification has played an important role in various research area including e-commerce applications and a number of advanced Computational Intelligence techniques including machine learning and computational linguistics have been proposed in the literature for improved sentiment classification results. While such studies focus on improving performance with new techniques or extending existing algorithms based on previously used dataset, few studies provide practitioners with insight on what techniques are better for their datasets that have different properties. This paper applies four different sentiment classification techniques from machine learning (NaĂŻve Bayes, SVM and Decision Tree) and sentiment orientation approaches to datasets obtained from various sources (IMDB, Twitter, Hotel review, and Amazon review datasets) to learn how different data properties including dataset size, length of target documents, and subjectivity of data affect the performance of those techniques. The results of computational experiments confirm the sensitivity of the techniques on data properties including training data size, the document length and subjectivity of training /test data in the improvement of performances of techniques. The theoretical and practical implications of the findings are discussed.This study was partially funded by Korea National Research Foundation through Global Research Network Program (Project no. 2016S1A2A2912265) and EU funded project Policy Compass (Project no. 283700)

    A study on text-score disagreement in online reviews

    Get PDF
    In this paper, we focus on online reviews and employ artificial intelligence tools, taken from the cognitive computing field, to help understanding the relationships between the textual part of the review and the assigned numerical score. We move from the intuitions that 1) a set of textual reviews expressing different sentiments may feature the same score (and vice-versa); and 2) detecting and analyzing the mismatches between the review content and the actual score may benefit both service providers and consumers, by highlighting specific factors of satisfaction (and dissatisfaction) in texts. To prove the intuitions, we adopt sentiment analysis techniques and we concentrate on hotel reviews, to find polarity mismatches therein. In particular, we first train a text classifier with a set of annotated hotel reviews, taken from the Booking website. Then, we analyze a large dataset, with around 160k hotel reviews collected from Tripadvisor, with the aim of detecting a polarity mismatch, indicating if the textual content of the review is in line, or not, with the associated score. Using well established artificial intelligence techniques and analyzing in depth the reviews featuring a mismatch between the text polarity and the score, we find that -on a scale of five stars- those reviews ranked with middle scores include a mixture of positive and negative aspects. The approach proposed here, beside acting as a polarity detector, provides an effective selection of reviews -on an initial very large dataset- that may allow both consumers and providers to focus directly on the review subset featuring a text/score disagreement, which conveniently convey to the user a summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be published in the Journal of Cognitive Computation, available at Springer via http://dx.doi.org/10.1007/s12559-017-9496-

    Assessment, Implication, and Analysis of Online Consumer Reviews: A Literature Review

    Get PDF
    The onset of e-marketplace, virtual communities and social networking has appreciated the influential capability of online consumer reviews (OCR) and therefore necessitate conglomeration of the body of knowledge. This article attempts to conceptually cluster academic literature in both management and technical domain. The study follows a framework which broadly clusters management research under two heads: OCR Assessment and OCR Implication (business implication). Parallel technical literature has been reviewed to reconcile methodologies adopted in the analysis of text content on the web, majorly reviews. Text mining through automated tools, algorithmic contribution (dominant majorly in technical stream literature) and manual assessment (derived from the stream of content analysis) has been studied in this review article. Literature survey of both the domains is analyzed to propose possible area for further research. Usage of text analysis methods along with statistical and data mining techniques to analyze review text and utilize the knowledge creation for solving managerial issues can possibly constitute further work. Available at: https://aisel.aisnet.org/pajais/vol9/iss2/4

    Big data and Sentiment Analysis considering reviews from e-commerce platforms to predict consumer behavior

    Get PDF
    Treballs Finals del Màster de Recerca en Empresa, Facultat d'Economia i Empresa, Universitat de Barcelona, Curs: 2019-2020, Tutor: Javier Manuel Romaní Fernández ; Jaime Gil LafuenteNowadays and since the last two decades, digital data is generated on a massive scale, this phenomenon is known as Big Data (BD). This phenomenon supposes a change in the way of managing and drawing conclusions from data. Moreover, techniques and methods used in artificial intelligence shape new ways of analysis considering BD. Sentiment Analysis (SA) or Opinion Mining (OM) is a topic widely studied for the last few years due to its potential in extracting value from data. However, it is a topic that has been more explored in the fields of engineering or linguistics and not so much in business and marketing fields. For this reason, the aim of this study is to provide a reachable guide that includes the main BD concepts and technologies to those who do not come from a technical field such as Marketing directors. This essay is articulated in two parts. Firstly, it is described the BD ecosystem and the technologies involved. Secondly, it is conducted a systematic literature review in which articles related with the field of SA are analysed. The contribution of this study is a summarization and a brief description of the main technologies behind BD, as well as the techniques and procedures currently involved in SA

    BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

    Full text link
    Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only generate limited types of data and support specific big data systems such as Hadoop. Hence we develop a tool, called Big Data Generator Suite (BDGS), to efficiently generate scalable big data while employing data models derived from real data to preserve data veracity. The effectiveness of BDGS is demonstrated by developing six data generators covering three representative data types (structured, semi-structured and unstructured) and three data sources (text, graph, and table data)
    • …