955 research outputs found
Discovering conversational topics and emotions associated with Demonetization tweets in India
Social media platforms contain great wealth of information which provides us
opportunities explore hidden patterns or unknown correlations, and understand
people's satisfaction with what they are discussing. As one showcase, in this
paper, we summarize the data set of Twitter messages related to recent
demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights
from Twitter's data. Our proposed system automatically extracts the popular
latent topics in conversations regarding demonetization discussed in Twitter
via the Latent Dirichlet Allocation (LDA) based topic model and also identifies
the correlated topics across different categories. Additionally, it also
discovers people's opinions expressed through their tweets related to the event
under consideration via the emotion analyzer. The system also employs an
intuitive and informative visualization to show the uncovered insight.
Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI),
to select the best LDA models. The obtained LDA results show that the tool can
be effectively used to extract discussion topics and summarize them for further
manual analysis.Comment: 6 pages, 11 figures. arXiv admin note: substantial text overlap with
arXiv:1608.02519 by other authors; text overlap with arXiv:1705.08094 by
other author
Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014
Topic models are widely used in natural language processing, allowing
researchers to estimate the underlying themes in a collection of documents.
Most topic models use unsupervised methods and hence require the additional
step of attaching meaningful labels to estimated topics. This process of manual
labeling is not scalable and suffers from human bias. We present a
semi-automatic transfer topic labeling method that seeks to remedy these
problems. Domain-specific codebooks form the knowledge-base for automated topic
labeling. We demonstrate our approach with a dynamic topic model analysis of
the complete corpus of UK House of Commons speeches 1935-2014, using the coding
instructions of the Comparative Agendas Project to label topics. We show that
our method works well for a majority of the topics we estimate; but we also
find that institution-specific topics, in particular on subnational governance,
require manual input. We validate our results using human expert coding
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
- …