20,407 research outputs found
Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Mining a set of meaningful topics organized into a hierarchy is intuitively
appealing since topic correlations are ubiquitous in massive text corpora. To
account for potential hierarchical topic structures, hierarchical topic models
generalize flat topic models by incorporating latent topic hierarchies into
their generative modeling process. However, due to their purely unsupervised
nature, the learned topic hierarchy often deviates from users' particular needs
or interests. To guide the hierarchical topic discovery process with minimal
user supervision, we propose a new task, Hierarchical Topic Mining, which takes
a category tree described by category names only, and aims to mine a set of
representative terms for each category from a text corpus to help a user
comprehend his/her interested topics. We develop a novel joint tree and text
embedding method along with a principled optimization procedure that allows
simultaneous modeling of the category tree structure and the corpus generative
process in the spherical space for effective category-representative term
discovery. Our comprehensive experiments show that our model, named JoSH, mines
a high-quality set of hierarchical topics with high efficiency and benefits
weakly-supervised hierarchical text classification tasks.Comment: KDD 2020 Research Track. (Code: https://github.com/yumeng5/JoSH
Embedding initial data for black hole collisions
We discuss isometric embedding diagrams for the visualization of initial data
for the problem of the head-on collision of two black holes. The problem of
constructing the embedding diagrams is explicitly presented for the best
studied initial data, the Misner geometry. We present a partial solution of the
embedding diagrams and discuss issues related to completing the solution.Comment: (27pp text, 11 figures
Weakly-Supervised Neural Text Classification
Deep neural networks are gaining increasing popularity for the classic text
classification task, due to their strong expressive power and less requirement
for feature engineering. Despite such attractiveness, neural text
classification models suffer from the lack of training data in many real-world
applications. Although many semi-supervised and weakly-supervised text
classification models exist, they cannot be easily applied to deep neural
models and meanwhile support limited supervision types. In this paper, we
propose a weakly-supervised method that addresses the lack of training data in
neural text classification. Our method consists of two modules: (1) a
pseudo-document generator that leverages seed information to generate
pseudo-labeled documents for model pre-training, and (2) a self-training module
that bootstraps on real unlabeled data for model refinement. Our method has the
flexibility to handle different types of weak supervision and can be easily
integrated into existing deep neural models for text classification. We have
performed extensive experiments on three real-world datasets from different
domains. The results demonstrate that our proposed method achieves inspiring
performance without requiring excessive training data and outperforms baseline
methods significantly.Comment: CIKM 2018 Full Pape
- …