628 research outputs found

    SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

    Get PDF
    In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which one is better for identifying the polarity (i.e., positive or negative) of a message. Accordingly, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, \textit{as they are used in practice}, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This article aims at filling this gap by presenting a benchmark comparison of twenty-four popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of eighteen labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies considerably across datasets. Aiming at boosting the development of this research area, we open the methods' codes and datasets used in this article, deploying them in a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods

    Modeling and Sentiment Analysis of Online Reviews in Hospitality Industry

    Get PDF
    With the great extent of use of smartphones and the internet, the online hotel booking service providers have excessively increased thus producing more user-generated content in the form of reviews and comments about the customer experience. These reviews of visited customer’s aids hotel management personnel not only to forecast the future demand but also to implement effective strategies for better service. It is becoming a tuff job in this scenario for the hotel management to get exact information from the wide range of reviews. In this analysis, it is to identify the classification of the sentiment from the customer reviews. The classification can be done with text mining approach with the source of information. Two dictionaries are developed for the usage of data classification around 431 reviews taken from Tripadvisor.com and Booking.com. Finally Latent Dirichent Allocation (LDA) modeling algorithm is applied to identify related topics and it was used to sort out the issues in consumer sentiment analysis.     Study findings revealed that majority of the reviews were with positive sentiments and the topics found best with hospitality domain and sentiment term were such as “food”, “hospitality”, “room”, “people”, “friendly” , “Relax”, “feelings”, and “holiday” as hospitality terms and “Strong Positive” and “Ordinary Positive” as sentiment terms

    Sentiment Classification of Online Customer Reviews and Blogs Using Sentence-level Lexical Based Semantic Orientation Method

    Get PDF
    ABSTRACT Sentiment analysis is the process of extracting knowledge from the peoples‟ opinions, appraisals and emotions toward entities, events and their attributes. These opinions greatly impact on customers to ease their choices regarding online shopping, choosing events, products and entities. With the rapid growth of online resources, a vast amount of new data in the form of customer reviews and opinions are being generated progressively. Hence, sentiment analysis methods are desirable for developing efficient and effective analyses and classification of customer reviews, blogs and comments. The main inspiration for this thesis is to develop high performance domain independent sentiment classification method. This study focuses on sentiment analysis at the sentence level using lexical based method for different type data such as reviews and blogs. The proposed method is based on general lexicons i.e. WordNet, SentiWordNet and user defined lexical dictionaries for sentiment orientation. The relations and glosses of these dictionaries provide solution to the domain portability problem. The experiments are performed on various data sets such as customer reviews and blogs comments. The results show that the proposed method with sentence contextual information is effective for sentiment classification. The proposed method performs better than word and text level corpus based machine learning methods for semantic orientation. The results highlight that the proposed method achieves an average accuracy of 86% at sentence-level and 97% at feedback level for customer reviews. Similarly, it achieves an average accuracy of 83% at sentence level and 86% at feedback level for blog comment

    Automatically generating a sentiment lexicon for the Malay language

    Get PDF
    This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a foundation for further research for the Malay language in this area

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    Prediction of Stock Market Volatility Utilizing Sentiment from News and Social Media Texts : A study on the practical implementation of sentiment analysis and deep learning models for predicting day-ahead volatility

    Get PDF
    This thesis studies the impact of sentiment on the prediction of volatility for 100 of the largest stocks in the S&P500 index. The purpose is to find out if sentiment can improve the forecast of day-ahead volatility wherein volatility is measured as the realized volatility of intraday returns. The textual data has been gathered from three different sources: Eikon, Twitter, and Reddit. The data consists of respectively 397 564 headlines from Eikon, 35 811 098 tweets, and 4 109 008 comments from Reddit. These numbers represent the uncleaned data before filtration. The data has been collected for the period between 01.08.2021 and 31.08.2022. Sentiment is calculated by the FinBERT model, an NLP model created by further pre-training of the BERT model on financial text. To predict volatility with the sentiment from FinBERT, three different deep learning models have been applied: A feed forward neural network, a recurrent neural network, and a long short-term memory model. They are used to solve both regression and classification problems. The inference analysis shows significant effects from the computed sentiment variables, and it implies that there exists a correlation between the number of text items and volatility. This is in line with previous literature on sentiment and volatility. The results from the deep learning models show that sentiment has an impact on the prediction of volatility. Both in terms of lower MSE and MAE for the regression problem and higher accuracy for the classification problem. Moreover, this thesis looks at potential weaknesses that could influence the validity of the results. Potential weaknesses include how sentiment is represented, noise in the data, and the Absftarcatc tthat the FinBERT model is not trained on financial oriented text from social media.nhhma
    • …
    corecore