2,534 research outputs found
Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability
Understanding the shopping motivations behind market baskets has high
commercial value in the grocery retail industry. Analyzing shopping
transactions demands techniques that can cope with the volume and
dimensionality of grocery transactional data while keeping interpretable
outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to
process grocery transactions and to discover a broad representation of
customers' shopping motivations. However, summarizing the posterior
distribution of an LDA model is challenging, while individual LDA draws may not
be coherent and cannot capture topic uncertainty. Moreover, the evaluation of
LDA models is dominated by model-fit measures which may not adequately capture
the qualitative aspects such as interpretability and stability of topics.
In this paper, we introduce clustering methodology that post-processes
posterior LDA draws to summarise the entire posterior distribution and identify
semantic modes represented as recurrent topics. Our approach is an alternative
to standard label-switching techniques and provides a single posterior summary
set of topics, as well as associated measures of uncertainty. Furthermore, we
establish a more holistic definition for model evaluation, which assesses topic
models based not only on their likelihood but also on their coherence,
distinctiveness and stability. By means of a survey, we set thresholds for the
interpretation of topic coherence and topic similarity in the domain of grocery
retail data. We demonstrate that the selection of recurrent topics through our
clustering methodology not only improves model likelihood but also outperforms
the qualitative aspects of LDA such as interpretability and stability. We
illustrate our methods on an example from a large UK supermarket chain.Comment: 20 pages, 9 figure
Ideological and Temporal Components of Network Polarization in Online Political Participatory Media
Political polarization is traditionally analyzed through the ideological
stances of groups and parties, but it also has a behavioral component that
manifests in the interactions between individuals. We present an empirical
analysis of the digital traces of politicians in politnetz.ch, a Swiss online
platform focused on political activity, in which politicians interact by
creating support links, comments, and likes. We analyze network polarization as
the level of intra- party cohesion with respect to inter-party connectivity,
finding that supports show a very strongly polarized structure with respect to
party alignment. The analysis of this multiplex network shows that each layer
of interaction contains relevant information, where comment groups follow
topics related to Swiss politics. Our analysis reveals that polarization in the
layer of likes evolves in time, increasing close to the federal elections of
2011. Furthermore, we analyze the internal social network of each party through
metrics related to hierarchical structures, information efficiency, and social
resilience. Our results suggest that the online social structure of a party is
related to its ideology, and reveal that the degree of connectivity across two
parties increases when they are close in the ideological space of a multi-party
system.Comment: 35 pages, 11 figures, Internet, Policy & Politics Conference,
University of Oxford, Oxford, UK, 25-26 September 201
Enhanced Topic-Based Modeling for Twitter Sentiment Analysis
abstract: In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of which use Latent Dirichlet Allocation (LDA) for topic modeling. The proposed topic based vector model has higher accuracies in terms of averaged F scores than the other two models.Dissertation/ThesisMasters Thesis Computer Science 201
Two to Five Truths in Non-Negative Matrix Factorization
In this paper, we explore the role of matrix scaling on a matrix of counts
when building a topic model using non-negative matrix factorization. We present
a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly
improve the quality of a non-negative matrix factorization. The results
parallel those in the spectral graph clustering work of \cite{Priebe:2019},
where the authors proved adjacency spectral embedding (ASE) spectral clustering
was more likely to discover core-periphery partitions and Laplacian Spectral
Embedding (LSE) was more likely to discover affinity partitions. In text
analysis non-negative matrix factorization (NMF) is typically used on a matrix
of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired
by LSE gives significant improvement for text topic models in a variety of
datasets. We illustrate the dramatic difference a matrix scalings in NMF can
greatly improve the quality of a topic model on three datasets where human
annotation is available. Using the adjusted Rand index (ARI), a measure cluster
similarity we see an increase of 50\% for Twitter data and over 200\% for a
newsgroup dataset versus using counts, which is the analogue of ASE. For clean
data, such as those from the Document Understanding Conference, NL gives over
40\% improvement over ASE. We conclude with some analysis of this phenomenon
and some connections of this scaling with other matrix scaling methods
- …