12 research outputs found

    A comparative study of Bayesian models for unsupervised sentiment detection

    No full text
    This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentimenttopic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection

    Latent sentiment model for weakly-supervised cross-lingual sentiment classification

    No full text
    In this paper, we present a novel weakly-supervised method for crosslingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domainspecific polarity words from text

    Probabilistic topic models for sentiment analysis on the Web

    Get PDF
    Sentiment analysis aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text, and has received a rapid growth of interest in natural language processing in recent years. Probabilistic topic models, on the other hand, are capable of discovering hidden thematic structure in large archives of documents, and have been an active research area in the field of information retrieval. The work in this thesis focuses on developing topic models for automatic sentiment analysis of web data, by combining the ideas from both research domains. One noticeable issue of most previous work in sentiment analysis is that the trained classifier is domain dependent, and the labelled corpora required for training could be difficult to acquire in real world applications. Another issue is that the dependencies between sentiment/subjectivity and topics are not taken into consideration. The main contribution of this thesis is therefore the introduction of three probabilistic topic models, which address the above concerns by modelling sentiment/subjectivity and topic simultaneously. The first model is called the joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. Unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when applied to new domains, the weakly-supervised nature of JST makes it highly portable to other domains, where the only supervision information required is a domain-independent sentiment lexicon. Apart from document-level sentiment classification results, JST can also extract sentiment-bearing topics automatically, which is a distinct feature compared to the existing sentiment analysis approaches. The second model is a dynamic version of JST called the dynamic joint sentiment-topic (dJST) model. dJST respects the ordering of documents, and allows the analysis of topic and sentiment evolution of document archives that are collected over a long time span. By accounting for the historical dependencies of documents from the past epochs in the generative process, dJST gives a richer posterior topical structure than JST, and can better respond to the permutations of topic prominence. We also derive online inference procedures based on a stochastic EM algorithm for efficiently updating the model parameters. The third model is called the subjectivity detection LDA (subjLDA) model for sentence-level subjectivity detection. Two sets of latent variables were introduced in subjLDA. One is the subjectivity label for each sentence; another is the sentiment label for each word token. By viewing the subjectivity detection problem as weakly-supervised generative model learning, subjLDA significantly outperforms the baseline and is comparable to the supervised approach which relies on much larger amounts of data for training. These models have been evaluated on real world datasets, demonstrating that joint sentiment topic modelling is indeed an important and useful research area with much to offer in the way of good results

    Review on recent advances in information mining from big consumer opinion data for product design

    Get PDF
    In this paper, based on more than ten years' studies on this dedicated research thrust, a comprehensive review concerning information mining from big consumer opinion data in order to assist product design is presented. First, the research background and the essential terminologies regarding online consumer opinion data are introduced. Next, studies concerning information extraction and information utilization of big consumer opinion data for product design are reviewed. Studies on information extraction of big consumer opinion data are explained from various perspectives, including data acquisition, opinion target recognition, feature identification and sentiment analysis, opinion summarization and sampling, etc. Reviews on information utilization of big consumer opinion data for product design are explored in terms of how to extract critical customer needs from big consumer opinion data, how to connect the voice of the customers with product design, how to make effective comparisons and reasonable ranking on similar products, how to identify ever-evolving customer concerns efficiently, and so on. Furthermore, significant and practical aspects of research trends are highlighted for future studies. This survey will facilitate researchers and practitioners to understand the latest development of relevant studies and applications centered on how big consumer opinion data can be processed, analyzed, and exploited in aiding product design

    Analysis of Twitter Messages for Sentiment

    Get PDF

    Methods for constructing an opinion network for politically controversial topics

    Get PDF
    The US presidential race, the re-election of President Hugo Chavez, and the economic crisis in Greece and other European countries are some of the controversial topics being played on the news everyday. To understand the landscape of opinions on political controversies, it would be helpful to know which politician or other stakeholder takes which position - support or opposition - on specific aspects of these topics. The work described in this thesis aims to automatically derive a map of the opinions-people network from news and other Web docu- ments. The focus is on acquiring opinions held by various stakeholders on politi- cally controversial topics. This opinions-people network serves as a knowledge- base of opinions in the form of (opinion holder) (opinion) (topic) triples. Our system to build this knowledge-base makes use of online news sources in order to extract opinions from text snippets. These sources come with a set of unique challenges. For example, processing text snippets involves not just iden- tifying the topic and the opinion, but also attributing that opinion to a specific opinion holder. This requires making use of deep parsing and analyzing the parse tree. Moreover, in order to ensure uniformity, both the topic as well the opinion holder should be mapped to canonical strings, and the topics should also be organized into a hierarchy. Our system relies on two main components: i) acquiring opinions which uses a combination of techniques to extract opinions from online news sources, and ii) organizing topics which crawls and extracts de- bates from online sources, and organizes these debates in a hierarchy of political controversial topics. We present systematic evaluations of the different compo- nents of our system, and show their high accuracies. We also present some of the different kinds of applications that require political analysis. We present some application requires political analysis such as identifying flip-floppers, political bias, and dissenters. Such applications can make use of the knowledge-base of opinions.Kontroverse Themen wie das US-Präsidentschaftsrennen, die Wiederwahl von Präsident Hugo Chavez, die Wirtschaftskrise in Griechenland sowie in anderen europäischen Ländern werden täglich in den Nachrichten diskutiert. Um die Bandbreite verschiedener Meinungen zu politischen Kontroversen zu verstehen, ist es hilfreich herauszufinden, welcher Politiker bzw. Interessenvertreter welchen Standpunkt (Pro oder Contra) bezüglich spezifischer Aspekte dieser Themen einnimmt. Diese Dissertation beschreibt ein Verfahren, welches automatisch eine Übersicht des Meinung-Mensch-Netzwerks aus aktuellen Nachrichten und anderen Web-Dokumenten ableitet. Der Fokus liegt hierbei auf dem Erfassen von Meinungen verschiedener Interessenvertreter bezüglich politisch kontroverser Themen. Dieses Meinung-Mensch-Netzwerk dient als Wissensbasis von Meinungen in Form von Tripeln: (Meinungsvertreter) (Meinung) (Thema). Um diese Wissensbasis aufzubauen, nutzt unser System Online-Nachrichten und extrahiert Meinungen aus Textausschnitten. Quellen von Online-Nachrichten stellen eine Reihe von besonderen Anforderungen an unser System. Zum Beispiel umfasst die Verarbeitung von Textausschnitten nicht nur die Identifikation des Themas und der geschilderten Meinung, sondern auch die Zuordnung der Stellungnahme zu einem spezifischen Meinungsvertreter.Dies erfordert eine tiefgründige Analyse sowie eine genaue Untersuchung des Syntaxbaumes. Um die Einheitlichkeit zu gewährleisten, müssen darüber hinaus Thema sowie Meinungsvertreter auf ein kanonisches Format abgebildet und die Themen hierarchisch angeordnet werden. Unser System beruht im Wesentlichen auf zwei Komponenten: i) Erkennen von Meinungen, welches verschiedene Techniken zur Extraktion von Meinungen aus Online-Nachrichten beinhaltet, und ii) Erkennen von Beziehungen zwischen Themen, welches das Crawling und Extrahieren von Debatten aus Online-Quellen sowie das Organisieren dieser Debatten in einer Hierarchie von politisch kontroversen Themen umfasst. Wir präsentieren eine systematische Evaluierung der verschiedenen Systemkomponenten, welche die hohe Genauigkeit der von uns entwickelten Techniken zeigt. Wir diskutieren außerdem verschiedene Arten von Anwendungen, die eine politische Analyse erfordern, wie zum Beispiel die Erkennung von Opportunisten, politische Voreingenommenheit und Dissidenten. All diese Anwendungen können durch die Wissensbasis von Meinungen umfangreich profitieren

    Explainable Recommendation: Theory and Applications

    Full text link
    Although personalized recommendation has been investigated for decades, the wide adoption of Latent Factor Models (LFM) has made the explainability of recommendations a critical issue to both the research community and practical application of recommender systems. For example, in many practical systems the algorithm just provides a personalized item recommendation list to the users, without persuasive personalized explanation about why such an item is recommended while another is not. Unexplainable recommendations introduce negative effects to the trustworthiness of recommender systems, and thus affect the effectiveness of recommendation engines. In this work, we investigate explainable recommendation in aspects of data explainability, model explainability, and result explainability, and the main contributions are as follows: 1. Data Explainability: We propose Localized Matrix Factorization (LMF) framework based Bordered Block Diagonal Form (BBDF) matrices, and further applied this technique for parallelized matrix factorization. 2. Model Explainability: We propose Explicit Factor Models (EFM) based on phrase-level sentiment analysis, as well as dynamic user preference modeling based on time series analysis. In this work, we extract product features and user opinions towards different features from large-scale user textual reviews based on phrase-level sentiment analysis techniques, and introduce the EFM approach for explainable model learning and recommendation. 3. Economic Explainability: We propose the Total Surplus Maximization (TSM) framework for personalized recommendation, as well as the model specification in different types of online applications. Based on basic economic concepts, we provide the definitions of utility, cost, and surplus in the application scenario of Web services, and propose the general framework of web total surplus calculation and maximization.Comment: 169 pages, in Chinese, 3 main research chapter
    corecore