1 research outputs found
A Joint Model of Conversational Discourse and Latent Topics on Microblogs
Conventional topic models are ineffective for topic extraction from microblog
messages, because the data sparseness exhibited in short messages lacking
structure and contexts results in poor message-level word co-occurrence
patterns. To address this issue, we organize microblog messages as conversation
trees based on their reposting and replying relations, and propose an
unsupervised model that jointly learns word distributions to represent: 1)
different roles of conversational discourse, 2) various latent topics in
reflecting content information. By explicitly distinguishing the probabilities
of messages with varying discourse roles in containing topical words, our model
is able to discover clusters of discourse words that are indicative of topical
content. In an automatic evaluation on large-scale microblog corpora, our joint
model yields topics with better coherence scores than competitive topic models
from previous studies. Qualitative analysis on model outputs indicates that our
model induces meaningful representations for both discourse and topics. We
further present an empirical study on microblog summarization based on the
outputs of our joint model. The results show that the jointly modeled discourse
and topic representations can effectively indicate summary-worthy content in
microblog conversations.Comment: Accepted in Special Issue of the journal Computational Linguistics
on: Language in Social Media: Exploiting discourse and other contextual
informatio