31,006 research outputs found
Efficient Correlated Topic Modeling with Topic Embedding
Correlated topic modeling has been limited to small model and problem sizes
due to their high computational cost and poor scaling. In this paper, we
propose a new model which learns compact topic embeddings and captures topic
correlations through the closeness between the topic vectors. Our method
enables efficient inference in the low-dimensional embedding space, reducing
previous cubic or quadratic time complexity to linear w.r.t the topic size. We
further speedup variational inference with a fast sampler to exploit sparsity
of topic occurrence. Extensive experiments show that our approach is capable of
handling model and data scales which are several orders of magnitude larger
than existing correlation results, without sacrificing modeling quality by
providing competitive or superior performance in document classification and
retrieval.Comment: KDD 2017 oral. The first two authors contributed equall
Measuring Emotions in the COVID-19 Real World Worry Dataset
The COVID-19 pandemic is having a dramatic impact on societies and economies
around the world. With various measures of lockdowns and social distancing in
place, it becomes important to understand emotional responses on a large scale.
In this paper, we present the first ground truth dataset of emotional responses
to COVID-19. We asked participants to indicate their emotions and express these
in text. This resulted in the Real World Worry Dataset of 5,000 texts (2,500
short + 2,500 long texts). Our analyses suggest that emotional responses
correlated with linguistic measures. Topic modeling further revealed that
people in the UK worry about their family and the economic situation.
Tweet-sized texts functioned as a call for solidarity, while longer texts shed
light on worries and concerns. Using predictive modeling approaches, we were
able to approximate the emotional responses of participants from text within
14% of their actual value. We encourage others to use the dataset and improve
how we can use automated methods to learn about emotional responses and worries
about an urgent problem.Comment: Accepted to ACL 2020 COVID-19 worksho
Inducing Language Networks from Continuous Space Word Representations
Recent advancements in unsupervised feature learning have developed powerful
latent representations of words. However, it is still not clear what makes one
representation better than another and how we can learn the ideal
representation. Understanding the structure of latent spaces attained is key to
any future advancement in unsupervised learning. In this work, we introduce a
new view of continuous space word representations as language networks. We
explore two techniques to create language networks from learned features by
inducing them for two popular word representation methods and examining the
properties of their resulting networks. We find that the induced networks
differ from other methods of creating language networks, and that they contain
meaningful community structure.Comment: 14 page
The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a
Bayesian nonparametric prior for mixed membership models. DILN is a
generalization of the hierarchical Dirichlet process (HDP) that models
correlation structure between the weights of the atoms at the group level. We
derive a representation of DILN as a normalized collection of gamma-distributed
random variables, and study its statistical properties. We consider
applications to topic modeling and derive a variational inference algorithm for
approximate posterior inference. We study the empirical performance of the DILN
topic model on four corpora, comparing performance with the HDP and the
correlated topic model (CTM). To deal with large-scale data sets, we also
develop an online inference algorithm for DILN and compare with online HDP and
online LDA on the Nature magazine, which contains approximately 350,000
articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of
this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US
- …