2,338 research outputs found
Weakly-Supervised Joint Sentiment-Topic Detection from Text
publication-status: Acceptedtypes: ArticleCopyright © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. A reparameterized version of the JST model called Reverse-JST, by reversing the sequence of sentiment and topic generation in the modelling process, is also studied. Although JST is equivalent to Reverse-JST without hierarchical prior, extensive experiments show that when sentiment priors are added, JST performs consistently better than Reverse-JST. Besides, unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when shifting to other domains, the weakly-supervised nature of JST makes it highly portable to other domains. This is verified by the experimental results on datasets from five different domains where the JST model even outperforms existing semi-supervised approaches in some of the datasets despite using no labelled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the web in an open-ended fashion
Probabilistic topic models for sentiment analysis on the Web
Sentiment analysis aims to use automated tools to detect subjective information such as
opinions, attitudes, and feelings expressed in text, and has received a rapid growth of
interest in natural language processing in recent years. Probabilistic topic models, on
the other hand, are capable of discovering hidden thematic structure in large archives of
documents, and have been an active research area in the field of information retrieval.
The work in this thesis focuses on developing topic models for automatic sentiment
analysis of web data, by combining the ideas from both research domains.
One noticeable issue of most previous work in sentiment analysis is that the trained
classifier is domain dependent, and the labelled corpora required for training could be
difficult to acquire in real world applications. Another issue is that the dependencies
between sentiment/subjectivity and topics are not taken into consideration. The main
contribution of this thesis is therefore the introduction of three probabilistic topic
models, which address the above concerns by modelling sentiment/subjectivity and topic
simultaneously.
The first model is called the joint sentiment-topic (JST) model based on latent Dirichlet
allocation (LDA), which detects sentiment and topic simultaneously from text. Unlike
supervised approaches to sentiment classification which often fail to produce
satisfactory performance when applied to new domains, the weakly-supervised nature of JST
makes it highly portable to other domains, where the only supervision information
required is a domain-independent sentiment lexicon. Apart from document-level sentiment
classification results, JST can also extract sentiment-bearing topics automatically,
which is a distinct feature compared to the existing sentiment analysis approaches.
The second model is a dynamic version of JST called the dynamic joint sentiment-topic
(dJST) model. dJST respects the ordering of documents, and allows the analysis of topic
and sentiment evolution of document archives that are collected over a long time span. By
accounting for the historical dependencies of documents from the past epochs in the
generative process, dJST gives a richer posterior topical structure than JST, and can
better respond to the permutations of topic prominence. We also derive online inference
procedures based on a stochastic EM algorithm for efficiently updating the model
parameters.
The third model is called the subjectivity detection LDA (subjLDA) model for
sentence-level subjectivity detection. Two sets of latent variables were introduced in
subjLDA. One is the subjectivity label for each sentence; another is the sentiment label
for each word token. By viewing the subjectivity detection problem as weakly-supervised
generative model learning, subjLDA significantly outperforms the baseline and is
comparable to the supervised approach which relies on much larger amounts of data for
training.
These models have been evaluated on real world datasets, demonstrating that joint
sentiment topic modelling is indeed an important and useful research area with much to
offer in the way of good results
Latent sentiment model for weakly-supervised cross-lingual sentiment classification
In this paper, we present a novel weakly-supervised method for crosslingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domainspecific polarity words from text
Weakly-Supervised Neural Text Classification
Deep neural networks are gaining increasing popularity for the classic text
classification task, due to their strong expressive power and less requirement
for feature engineering. Despite such attractiveness, neural text
classification models suffer from the lack of training data in many real-world
applications. Although many semi-supervised and weakly-supervised text
classification models exist, they cannot be easily applied to deep neural
models and meanwhile support limited supervision types. In this paper, we
propose a weakly-supervised method that addresses the lack of training data in
neural text classification. Our method consists of two modules: (1) a
pseudo-document generator that leverages seed information to generate
pseudo-labeled documents for model pre-training, and (2) a self-training module
that bootstraps on real unlabeled data for model refinement. Our method has the
flexibility to handle different types of weak supervision and can be easily
integrated into existing deep neural models for text classification. We have
performed extensive experiments on three real-world datasets from different
domains. The results demonstrate that our proposed method achieves inspiring
performance without requiring excessive training data and outperforms baseline
methods significantly.Comment: CIKM 2018 Full Pape
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces
We combine multi-task learning and semi-supervised learning by inducing a
joint embedding space between disparate label spaces and learning transfer
functions between label embeddings, enabling us to jointly leverage unlabelled
data and auxiliary, annotated datasets. We evaluate our approach on a variety
of sequence classification tasks with disparate label spaces. We outperform
strong single and multi-task baselines and achieve a new state-of-the-art for
topic-based sentiment analysis.Comment: To appear at NAACL 2018 (long
Latent dirichlet markov allocation for sentiment analysis
In recent years probabilistic topic models have gained tremendous attention in data mining and natural language processing research areas. In the field of information retrieval for text mining, a variety of probabilistic topic models have been used to analyse content of documents. A topic model is a generative model for documents, it specifies a probabilistic procedure by which documents can be generated. All topic models share the idea that documents are mixture of topics, where a topic is a probability distribution over words. In this paper we describe Latent Dirichlet Markov Allocation Model (LDMA), a new generative probabilistic topic model, based on Latent Dirichlet Allocation (LDA) and Hidden Markov Model (HMM), which emphasizes on extracting multi-word topics from text data. LDMA is a four-level hierarchical Bayesian model where topics are associated with documents, words are associated with topics and topics in the model can be presented with single- or multi-word terms. To evaluate performance of LDMA, we report results in the field of aspect detection in sentiment analysis, comparing to the basic LDA model
- …