551 research outputs found

    A study on text-score disagreement in online reviews

    Get PDF
    In this paper, we focus on online reviews and employ artificial intelligence tools, taken from the cognitive computing field, to help understanding the relationships between the textual part of the review and the assigned numerical score. We move from the intuitions that 1) a set of textual reviews expressing different sentiments may feature the same score (and vice-versa); and 2) detecting and analyzing the mismatches between the review content and the actual score may benefit both service providers and consumers, by highlighting specific factors of satisfaction (and dissatisfaction) in texts. To prove the intuitions, we adopt sentiment analysis techniques and we concentrate on hotel reviews, to find polarity mismatches therein. In particular, we first train a text classifier with a set of annotated hotel reviews, taken from the Booking website. Then, we analyze a large dataset, with around 160k hotel reviews collected from Tripadvisor, with the aim of detecting a polarity mismatch, indicating if the textual content of the review is in line, or not, with the associated score. Using well established artificial intelligence techniques and analyzing in depth the reviews featuring a mismatch between the text polarity and the score, we find that -on a scale of five stars- those reviews ranked with middle scores include a mixture of positive and negative aspects. The approach proposed here, beside acting as a polarity detector, provides an effective selection of reviews -on an initial very large dataset- that may allow both consumers and providers to focus directly on the review subset featuring a text/score disagreement, which conveniently convey to the user a summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be published in the Journal of Cognitive Computation, available at Springer via http://dx.doi.org/10.1007/s12559-017-9496-

    Sentiment Analysis Using Machine Learning Techniques

    Get PDF
    Before buying a product, people usually go to various shops in the market, query about the product, cost, and warranty, and then finally buy the product based on the opinions they received on cost and quality of service. This process is time consuming and the chances of being cheated by the seller are more as there is nobody to guide as to where the buyer can get authentic product and with proper cost. But now-a-days a good number of persons depend upon the on-line market for buying their required products. This is because the information about the products is available from multiple sources; thus it is comparatively cheap and also has the facility of home delivery. Again, before going through the process of placing order for any product, customers very often refer to the comments or reviews of the present users of the product, which help them take decision about the quality of the product as well as the service provided by the seller. Similar to placing order for products, it is observed that there are quite a few specialists in the field of movies, who go though the movie and then finally give a comment about the quality of the movie, i.e., to watch the movie or not or in five-star rating. These reviews are mainly in the text format and sometimes tough to understand. Thus, these reports need to be processed appropriately to obtain some meaningful information. Classification of these reviews is one of the approaches to extract knowledge about the reviews. In this thesis, different machine learning techniques are used to classify the reviews. Simulation and experiments are carried out to evaluate the performance of the proposed classification methods. It is observed that a good number of researchers have often considered two different review datasets for sentiment classification namely aclIMDb and Polarity dataset. The IMDb dataset is divided into training and testing data. Thus, training data are used for training the machine learning algorithms and testing data are used to test the data based on the training information. On the other hand, polarity dataset does not have separate data for training and testing. Thus, k-fold cross validation technique is used to classify the reviews. Four different machine learning techniques (MLTs) viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), and Linear Discriminant Analysis (LDA) are used for the classification of these movie reviews. Different performance evaluation parameters are used to evaluate the performance of the machine learning techniques. It is observed that among the above four machine learning algorithms, RF technique yields the classification result, with more accuracy. Secondly, n-gram based classification of reviews are carried out on the aclIMDb dataset..

    Augmenting Chinese Online Video Recommendations by Using Virtual Ratings Predicted by Review Sentiment Classification

    Full text link
    Abstract—In this paper we aim to resolve the recommendation problem by using the virtual ratings in online environments when user rating information is not available. As a matter of fact, in most of current websites especially the Chinese video-sharing ones, the traditional pure rating based collaborative filtering recommender methods are not fully qualified due to the sparsity of rating data. Motivated by our prior work on the investigation of user reviews that broadly appear in such sites, we hence propose a new recommender algorithm by fusing a self-supervised emoticon-integrated sentiment classification approach, by which the missing User-Item Rating Matrix can be substituted by the virtual ratings which are predicted by decomposing user reviews as given to the items. To test the algorithm’s practical value, we have first identified the self-supervised sentiment classification’s higher performance by comparing it with a supervised approach. Moreover, we conducted a statistic evaluation method to show the effectiveness of our recommender system on improving Chinese online video recommendations ’ accuracy. Keywords-Information retrieval; sentiment analysis; opinion mining; online video recommendation. I

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Sentiment Analysis Using Deep Learning: A Comparison Between Chinese And English

    Get PDF
    With the increasing popularity of opinion-rich resources, opinion mining and sentiment analysis has received increasing attention. Sentiment analysis is one of the most effective ways to find the opinion of authors. By mining what people think, sentiment analysis can provide the basis for decision making. Most of the objects of analysis are text data, such as Facebook status and movie reviews. Despite many sentiment classification models having good performance on English corpora, they are not good at Chinese or other languages. Traditional sentiment approaches impose many restrictions on the raw data, and they don't have enough capacity to deal with long-distance sequential dependencies. So, we propose a model based on recurrent neural network model using a context vector space model. Chinese information entropy is typically higher than English, we therefore hypothesise that context vector space model can be used to improve the accuracy of sentiment analysis. Our algorithm represents each complex input by a dense vector trained to translate sequence data to another sequence, like the translation of English and French. Then we build a recurrent neural network with the Long-Short-Term Memory model to deal the long-distance dependencies in input data, such as movie review. The results show that our approach has promise but still has a lot of room for improvement

    Sentiment Classification Considering Negation and Contrast Transition

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Opinion Mining and Sentiment Analysis using Bayesian and Neural Networks Approaches

    Get PDF
    Infotehnoloogiad on muutunud suureks osaks meie elust ja praeguseks on raske kujutada ette elu ilma vidinate ja internetita. Sotsiaalmeedia ei ole tänapäeval ainult informatsiooniallikas, vaid lubab kasutajatel ka omavahel suhelda ning jagada üksteisega arvamusi ja kogemusi. Teatud osa sellest infost on subjektiivne ning sisaldab kasutaja seisukohtadega seostuvat informatsiooni. Säärast informatsiooni analüüsides saab sellest eraldada kõige olulisema ning hiljem kasutada saadud informatsiooni analüüsimiseks ja otsuste tegemistes. Esmalt, et informatsiooni sellisel kujul kasutada, on vaja seda mõista ja kategoriseerida. Käesolevas töös õpitakse seisukohtade analüüsimise tehnikaid, et siis säutsudest arvamusi eraldada. Efektiivseks klassifitseerimiseks on oluline rakendada ülesande lahendamiseks algoritme, mis saavad sellega edukalt hakkama. Magistritöö põhieesmärgiks on uurida algoritme, mida saaks kasutada seisukohtade hindamiseks. Teostatakse andmete eeltöötlust ja viiakse läbi mitmeid eksperimente. Klassifitseerijat treenitakse ja testitakse kahe erineva andmekogu peal kasutades kahte erinevat klassifitseerija implementatsiooni, milleks on naiivne Bayes ja konvolutsiooniline närvivõrk. Lisaks arutatakse klassifitseerija efektiivsuse üle ja mis mõju avaldavad sellele andmed, mille peal seda treenitakse.Information technologies have firmly entered our life and it is impossible to imagine our life without gadgets or the Internet. Today, social media is not only a source that broadcasts information to the users, but it allows users to intercommunicate and share their views and experience with each other. Some portion of such data is subjective and contains opinionated information that can be further analyzed to retrieve essential data from it and later use for various purposes for analysis and decision support. In order to use this type of that the first step is to understand it and categorize opinions in the information. Hence, in this dissertation, sentiment analysis techniques are studied in order to retrieve opinions from the tweets. In order to ensure efficient classification, it is important to apply algorithms that perform well on this task. Therefore, the main goal of the thesis is to investigate algorithms that can be applied for the opinion estimation. To that extend, data preprocessing and several experiments are conducted, namely, the classifier is trained and tested on two different datasets with two different classifiers (Naive Bayes and convolutional neural network). In addition, the influence of the training data on the classifier efficiency is discussed

    Probabilistic topic models for sentiment analysis on the Web

    Get PDF
    Sentiment analysis aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text, and has received a rapid growth of interest in natural language processing in recent years. Probabilistic topic models, on the other hand, are capable of discovering hidden thematic structure in large archives of documents, and have been an active research area in the field of information retrieval. The work in this thesis focuses on developing topic models for automatic sentiment analysis of web data, by combining the ideas from both research domains. One noticeable issue of most previous work in sentiment analysis is that the trained classifier is domain dependent, and the labelled corpora required for training could be difficult to acquire in real world applications. Another issue is that the dependencies between sentiment/subjectivity and topics are not taken into consideration. The main contribution of this thesis is therefore the introduction of three probabilistic topic models, which address the above concerns by modelling sentiment/subjectivity and topic simultaneously. The first model is called the joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. Unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when applied to new domains, the weakly-supervised nature of JST makes it highly portable to other domains, where the only supervision information required is a domain-independent sentiment lexicon. Apart from document-level sentiment classification results, JST can also extract sentiment-bearing topics automatically, which is a distinct feature compared to the existing sentiment analysis approaches. The second model is a dynamic version of JST called the dynamic joint sentiment-topic (dJST) model. dJST respects the ordering of documents, and allows the analysis of topic and sentiment evolution of document archives that are collected over a long time span. By accounting for the historical dependencies of documents from the past epochs in the generative process, dJST gives a richer posterior topical structure than JST, and can better respond to the permutations of topic prominence. We also derive online inference procedures based on a stochastic EM algorithm for efficiently updating the model parameters. The third model is called the subjectivity detection LDA (subjLDA) model for sentence-level subjectivity detection. Two sets of latent variables were introduced in subjLDA. One is the subjectivity label for each sentence; another is the sentiment label for each word token. By viewing the subjectivity detection problem as weakly-supervised generative model learning, subjLDA significantly outperforms the baseline and is comparable to the supervised approach which relies on much larger amounts of data for training. These models have been evaluated on real world datasets, demonstrating that joint sentiment topic modelling is indeed an important and useful research area with much to offer in the way of good results