    Econometrics meets sentiment : an overview of methodology and applications

    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    A study of feature exraction techniques for classifying topics and sentiments from news posts

    Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events happened around the world. As a result, many text mining researches have been conducted in the area of Temporal Sentiment Analysis, which one of its most challenging tasks is to detect and extract the key features from news posts that arrive continuously overtime. However, extracting these features is a challenging task due to post’s complex properties, also posts about a specific topic may grow or vanish overtime leading in producing imbalanced datasets. Thus, this study has developed a comparative analysis on feature extraction Techniques which has examined various feature extraction techniques (TF-IDF, TF, BTO, IG, Chi-square) with three different n-gram features (Unigram, Bigram, Trigram), and using SVM as a classifier. The aim of this study is to discover the optimal Feature Extraction Technique (FET) that could achieve optimum accuracy results for both topic and sentiment classification. Accordingly, this analysis is conducted on three news channels’ datasets. The experimental results for topic classification have shown that Chi-square with unigram have proven to be the best FET compared to other techniques. Furthermore, to overcome the problem of imbalanced data, this study has combined the best FET with OverSampling technology. The evaluation results have shown an improvement in classifier’s performance and has achieved a higher accuracy at 93.37%, 92.89%, and 91.92 for BBC, Al-Arabiya, and Al-Jazeera, respectively, compared to what have been obtained on original datasets. Similarly, same combination (Chi-square+Unigram) has been used for sentiment classification and obtained accuracies at rates of 81.87%, 70.01%, 77.36%. However, testing the recognized optimal FET on unseen randomly selected news posts has shown a relatively very low accuracies for both topic and sentiment classification due to the changes of topics and sentiments over time

    How TripAdvisor’s reviewers level of expertise influence their online rating behaviour and the usefulness of reviews

    The internet has improved the buying behaviour of customers. The development of technologies has led to the dissemination of opinions on social networks where customers buy goods and services. These comments on social networks started to be a part of the purchasing process. Until a few years ago, customers used to choose their itineraries based on tourist guides or brochures. Nowadays, customers’ reviews have changed the way a destination is portrayed, enhancing the description of a product or a service to a level that not even the supplier was able to reach before. There are different types of reviewers. The aim of this study is to identify both reviews, experts and non-expert reviewers and analyse the way they write their reviews. Reviews of five hotels taken from the TripAdvisor website were used in order to conduct this study. After analyzing a great set of variables, the results show that there is not much different on the amount of positive/negative reviews written by a reviewer, however, there is a difference in the deeper meaning of a review when it is positive than when it is negative. The expert reviewer tends to be more emotional when writing positive reviews than negative reviews. Regarding the usefulness of the reviews, there is no significant difference in usefulness of a review whether is an written by an expert reviewer or by a non-expert reviewer. The results also indicate that being an expert does not influence the rating a reviewer gives to a hotel stay either. The study was conducted by using Lexalytics program to analyze a Natural Language Processing (NLP) used to classify reviews according to their polarity. With this study, a new research in study was filled. This study gives insights on the polarity of a review depending on the type of reviewer. The results of this study are also important for hotel managers in order for them to understand the type of guest in house.O desenvolvimento da tecnologia, com ênfase na internet e nos seus desenvolvimentos ao longo dos anos, melhorou o comportamento dos clientes e levou à disseminação de opiniões em redes sociais onde os clientes compram productos e serviços. Os comentários feitos a um produto ou serviço nas redes sociais começaram a fazer parte do processo da compra. Até há uns anos atrás, os clientes escolhiam os itinerários para as suas viagens com base em guias turísticos e brochuras. Recentemente, os comentários de clientes mudaram a maneira que um destino é explicado e ilustrado, melhorando, desta forma, a descrição de um produto/serviço a um nível que nem mesmo os fornecedores destes tinham alcançado ainda. Há diferentes tipos de reviewers. O objectivo deste estudo é identificar ambos tipos, expert e non-expert e analisar o estilo de reviews escrita por estes. Experts são assim denominados se tiverem escrito mais de dez reviews; por outro lado os non-expert reviewers são assim denominados se tiverem escrito menos de 10 reviews. Para este estudo, foi utilizada informação de cinco hotéis de Orlando, Florida, retirada do TripAdvisor. Depois de uma análise das variáveis, os resultados mostram que não há grande diferença no que toca ao volume de comentários positivos/negativos escritos por um utilizador. Por outro lado, existe uma diferença na emoção dada a cada comentário, entre os utilizadores. O expert reviewer tende a ser mais emocional quando escreve comentários positivos do que quando escreve comentários negativos. Relativamente a utilidade de cada comentário, não há grande diferença no que toca a ser um expert reviewer ou um non-expert a escrever um comentário. Os resultados indicam, também, que ser um expert não tem qualquer influência na avaliação que um utilizador dá a sua estadia num hotel. Este estudo foi feito com base no programa Lexalytics, com objectivo de analisar a Natural Language Processing (NLP) usada para classificar os comentários de acordo com a sua polaridade

    App Review Analytics Of Free Games Listed On Google Play

    Smartphones have become popular in recent years; in turn, the number of application developers and publishers has grown rapidly. To understand users’ app preferences, many platforms such as Google Play provide different mechanism that allows users to rank apps. However, more detailed insights on user’s feelings, experiences, critiques, suggestions, or preferences are missing due to a lack of additional written comments. This research attempts to investigate the review analytics of Android games listed on Google Play using a proposed text analytic approach to extract all user reviews from game apps in Chinese. A total of 207,048 reviews of 4,268 free games from February to March 2013 are extracted and analyzed according to various metrics including game type and game attribute. The findings indicate there is high dependency between users’ gender and game type, males and females have differing opinions on game attributes. In particular, users of different game types prefer different game attributes. The results reveal product usage insights, as well as best practices for developers

    Analisis Sentimen Tweet Menggunakan Backpropagation Neural Network

    Analisis sentimen tweet berkembang sebagai sebuah kajian pada bidang Pengolahan Bahasa Alami yang bermanfaat mengetahui opini masyarakat terhadap sebuah topik tertentu secara otomatis. Pada penelitian ini kami mengajukan teknik analisis tweet kedalam tiga kelas (positif, negatif dan netral) menggunakan algoritma Backpropagation Neural Network. Input jaringan merupakan sejumlah kata terpilih yang dirangking mengunakan skor TF*IDF. Variasi praproses term dilakukan untuk menguji performa klasifikasi sentimen. Hasil pengujian menunjukkan metode yang kami ajukan berhasil melakukan klasifikasi dengan hasil terbaik dengan akurasi 78.34% dan presisi 84.21%

    Twitter location (sometimes) matters: Exploring the relationship between georeferenced tweet content and nearby feature classes

    In this paper, we investigate whether microblogging texts (tweets) produced on mobile devices are related to the geographical locations where they were posted. For this purpose, we correlate tweet topics to areas. In doing so, classified points of interest from OpenStreetMap serve as validation points. We adopted the classification and geolocation of these points to correlate with tweet content by means of manual, supervised, and unsupervised machine learning approaches. Evaluation showed the manual classification approach to be highest quality, followed by the supervised method, and that the unsupervised classification was of low quality. We found that the degree to which tweet content is related to nearby points of interest depends upon topic (that is, upon the OpenStreetMap category). A more general synthesis with prior research leads to the conclusion that the strength of the relationship of tweets and their geographic origin also depends upon geographic scale (where smaller scale correlations are more significant than those of larger scale)

    Social-media monitoring for cold-start recommendations

    Generating personalized movie recommendations to users is a problem that most commonly relies on user-movie ratings. These ratings are generally used either to understand the user preferences or to recommend movies that users with similar rating patterns have rated highly. However, movie recommenders are often subject to the Cold-Start problem: new movies have not been rated by anyone, so, they will not be recommended to anyone; likewise, the preferences of new users who have not rated any movie cannot be learned. In parallel, Social-Media platforms, such as Twitter, collect great amounts of user feedback on movies, as these are very popular nowadays. This thesis proposes to explore feedback shared on Twitter to predict the popularity of new movies and show how it can be used to tackle the Cold-Start problem. It also proposes, at a finer grain, to explore the reputation of directors and actors on IMDb to tackle the Cold-Start problem. To assess these aspects, a Reputation-enhanced Recommendation Algorithm is implemented and evaluated on a crawled IMDb dataset with previous user ratings of old movies,together with Twitter data crawled from January 2014 to March 2014, to recommend 60 movies affected by the Cold-Start problem. Twitter revealed to be a strong reputation predictor, and the Reputation-enhanced Recommendation Algorithm improved over several baseline methods. Additionally, the algorithm also proved to be useful when recommending movies in an extreme Cold-Start scenario, where both new movies and users are affected by the Cold-Start problem

    Discovering High-Profit Product Feature Groups by mining High Utility Sequential Patterns from Feature-Based Opinions

    Extracting a group of features together instead of a single feature from the mined opinions, such as “{battery, camera, design} of a smartphone,” may yield higher profit to the manufactures and higher customer satisfaction, and these can be called High Profit Feature Groups (HPFG). The accuracy of Opinion-Feature Extraction can be improved if more complex sequential patterns of customer reviews are learned and included in the user-behavior analysis to obtain relevant frequent feature groups. Existing Opinion-Feature Extraction systems that use Data Mining techniques with some sequences include those referred to in this thesis as Rashid13OFExt, Rana18OFExt, and HPFG19_HU. Rashid13OFExt and Rana18OFExt systems use Sequential Pattern Mining, Association Rule Mining, and Class Sequential Rules to obtain frequent product features and opinion words from reviews. However, these systems do not discover the frequent high profit features considering utility values (internal and external) such as cost, profit, quantity, or other user preferences. HPFG19_HU system uses High Utility Itemset Mining and Aspect-Based Sentiment Analysis to extract High Utility Aspect groups based on feature-opinion sets. It works on transaction databases of itemsets formed using aspects by considering the high utility values (e.g., are more profitable to the seller?) from the extracted frequent patterns from a set of opinion sentences. However, the HPFG19_HU system does not consider the order of occurrences (sequences) of product features formed in customer opinion sentences that help distinguish similar users and identifying more relevant and related high profit product features. This thesis proposes a system called High Profit Sequential Feature Group based on High Utility Sequences (HPSFG_HUS), which is an extension to the HPFG19_HU system. The proposed system combines Feature-Based Opinion Mining and High Utility Sequential Pattern Mining to extract High Profit Feature Groups from product reviews. The input to the proposed system is the product reviews corpus. The output is the High Profit Sequential Feature Groups in sequence databases that identify sequential patterns in the features extracted from opinions by considering the order of occurrences of features in the review. This method improves on existing system\u27s accuracy in extracting relevant frequent feature groups. The results on retailer’s graphs of extracted High Profit Sequential Feature Groups show that the proposed HPSFG_HUS system provides more accurate high feature groups, sales profit, and user satisfaction. Experimental results evaluating execution time, accuracy, precision, and comparison show higher revenue than the tested existing systems