9 research outputs found
Real-time Event Detection on Social Data Streams
Social networks are quickly becoming the primary medium for discussing what
is happening around real-world events. The information that is generated on
social platforms like Twitter can produce rich data streams for immediate
insights into ongoing matters and the conversations around them. To tackle the
problem of event detection, we model events as a list of clusters of trending
entities over time. We describe a real-time system for discovering events that
is modular in design and novel in scale and speed: it applies clustering on a
large stream with millions of entities per minute and produces a dynamically
updated set of events. In order to assess clustering methodologies, we build an
evaluation dataset derived from a snapshot of the full Twitter Firehose and
propose novel metrics for measuring clustering quality. Through experiments and
system profiling, we highlight key results from the offline and online
pipelines. Finally, we visualize a high profile event on Twitter to show the
importance of modeling the evolution of events, especially those detected from
social data streams.Comment: Accepted as a full paper at KDD 2019 on April 29, 201
Discriminative Topic Modeling with Logistic LDA
Despite many years of research into latent Dirichlet allocation (LDA),
applying LDA to collections of non-categorical items is still challenging. Yet
many problems with much richer data share a similar structure and could benefit
from the vast literature on LDA. We propose logistic LDA, a novel
discriminative variant of latent Dirichlet allocation which is easy to apply to
arbitrary inputs. In particular, our model can easily be applied to groups of
images, arbitrary text embeddings, and integrates well with deep neural
networks. Although it is a discriminative model, we show that logistic LDA can
learn from unlabeled data in an unsupervised manner by exploiting the group
structure present in the data. In contrast to other recent topic models
designed to handle arbitrary inputs, our model does not sacrifice the
interpretability and principled motivation of LDA
Discriminative topic modeling with logistic LDA
Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging for practitioners. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, or integrate deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA