4 research outputs found
Automatic generation of topic labels
Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic assignment of labels to topics has relied on a two-stage approach: (1) candidate labels are retrieved from a large pool (e.g. Wikipedia article titles); and then (2) re-ranked based on their semantic similarity to the topic terms.
However, these extractive approaches can only assign candidate labels from a restricted set that may not include any suitable ones. This paper proposes using a sequence-to-sequence neural-based approach to generate labels that does not suffer from this limitation. The model is trained over a new large synthetic dataset created using distant supervision. The method is evaluated by comparing the labels it generates to ones rated by humans
Multilingual Topic Labelling of News Topics using Ontological Mapping
The large volume of news produced daily makes topic modelling useful for analysing topical trends. A topic is usually represented by a ranked list of words but this can be difficult and time-consuming for humans to interpret. Therefore, various methods have been proposed to generate labels that capture the semantic content of a topic. However, there has been no work so far on coming up with multilingual labels which can be useful for exploring multilingual news collections. We propose an ontological mapping method that maps topics to concepts in a language-agnostic news ontology. We test our method on Finnish and English topics and show that it performs on par with state-of-the-art label generation methods, is able to produce multilingual labels, and can be applied to topics from languages that have not been seen during training without any modifications.Peer reviewe
Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model
Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that violate the moderation rules mostly share common linguistic and thematic features, their content varies across the different sections of the newspaper. We therefore make our models topic-aware, incorporating semantic features from a topic model into the classification decision. Our results show that topic information improves the performance of the model, increases its confidence in correct outputs, and helps us understand the model's outputs
Labeled Interactive Topic Models
Topic models are valuable for understanding extensive document collections,
but they don't always identify the most relevant topics. Classical
probabilistic and anchor-based topic models offer interactive versions that
allow users to guide the models towards more pertinent topics. However, such
interactive features have been lacking in neural topic models. To correct this
lacuna, we introduce a user-friendly interaction for neural topic models. This
interaction permits users to assign a word label to a topic, leading to an
update in the topic model where the words in the topic become closely aligned
with the given label. Our approach encompasses two distinct kinds of neural
topic models. The first includes models where topic embeddings are trainable
and evolve during the training process. The second kind involves models where
topic embeddings are integrated post-training, offering a different approach to
topic refinement. To facilitate user interaction with these neural topic
models, we have developed an interactive interface. This interface enables
users to engage with and re-label topics as desired. We evaluate our method
through a human study, where users can relabel topics to find relevant
documents. Using our method, user labeling improves document rank scores,
helping to find more relevant documents to a given query when compared to no
user labeling