1 research outputs found
Sparsemax and Relaxed Wasserstein for Topic Sparsity
Topic sparsity refers to the observation that individual documents usually
focus on several salient topics instead of covering a wide variety of topics,
and a real topic adopts a narrow range of terms instead of a wide coverage of
the vocabulary. Understanding this topic sparsity is especially important for
analyzing user-generated web content and social media, which are featured in
the form of extremely short posts and discussions. As topic sparsity of
individual documents in online social media increases, so does the difficulty
of analyzing the online text sources using traditional methods.
In this paper, we propose two novel neural models by providing sparse
posterior distributions over topics based on the Gaussian sparsemax
construction, enabling efficient training by stochastic backpropagation. We
construct an inference network conditioned on the input data and infer the
variational distribution with the relaxed Wasserstein (RW) divergence. Unlike
existing works based on Gaussian softmax construction and Kullback-Leibler (KL)
divergence, our approaches can identify latent topic sparsity with training
stability, predictive performance, and topic coherence. Experiments on
different genres of large text corpora have demonstrated the effectiveness of
our models as they outperform both probabilistic and neural methods.Comment: 10 Pages. To appear in WSDM 201