17 research outputs found
Pemodelan Topik Menggunakan Metode Latent Dirichlet Allocation dan Gibbs Sampling
Pemodelan topik adalah suatu alat yang digunakan untuk menemukan topik laten pada sekelompok dokumen. Pada penelitian ini dilakukan pemodelan topik dengan menggunakan metode Latent Dirichlet Allocation dan Gibbs Sampling. Enam artikel berita Bahasa Indonesia telah dikumpulkan dari portal berita detiknews dengan menggunakan metode Web Scrapper. Artikel berita dibagi menjadi dua kategori utama yaitu, narkoba dan COVID-19. Analisis model LDA dilakukan dengan menggunakan metode koherensi topik pengukuran skor UCI dengan hasil penelitian menyebutkan diperoleh lima buah topik optimal pada kedua konfigurasi pengujian
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
Topic models extract meaningful groups of words from documents, allowing for
a better understanding of data. However, the solutions are often not coherent
enough, and thus harder to interpret. Coherence can be improved by adding more
contextual knowledge to the model. Recently, neural topic models have become
available, while BERT-based representations have further pushed the state of
the art of neural models in general. We combine pre-trained representations and
neural topic models. Pre-trained BERT sentence embeddings indeed support the
generation of more meaningful and coherent topics than either standard LDA or
existing neural topic models. Results on four datasets show that our approach
effectively increases topic coherence
SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining
Recent research in opinion mining proposed word embedding-based topic
modeling methods that provide superior coherence compared to traditional topic
modeling. In this paper, we demonstrate how these methods can be used to
display correlated topic models on social media texts using SocialVisTUM, our
proposed interactive visualization toolkit. It displays a graph with topics as
nodes and their correlations as edges. Further details are displayed
interactively to support the exploration of large text collections, e.g.,
representative words and sentences of topics, topic and sentiment
distributions, hierarchical topic clustering, and customizable, predefined
topic labels. The toolkit optimizes automatically on custom data for optimal
coherence. We show a working instance of the toolkit on data crawled from
English social media discussions about organic food consumption. The
visualization confirms findings of a qualitative consumer research study.
SocialVisTUM and its training procedures are accessible online.Comment: Demo paper accepted for publication on RANLP 2021; 8 pages, 5
figures, 1 tabl
Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources
Given a controversial target such as ``nuclear energy'', argument mining aims
to identify the argumentative text from heterogeneous sources. Current
approaches focus on exploring better ways of integrating the target-associated
semantic information with the argumentative text. Despite their empirical
successes, two issues remain unsolved: (i) a target is represented by a word or
a phrase, which is insufficient to cover a diverse set of target-related
subtopics; (ii) the sentence-level topic information within an argument, which
we believe is crucial for argument mining, is ignored. To tackle the above
issues, we propose a novel explainable topic-enhanced argument mining approach.
Specifically, with the use of the neural topic model and the language model,
the target information is augmented by explainable topic representations.
Moreover, the sentence-level topic information within the argument is captured
by minimizing the distance between its latent topic distribution and its
semantic representation through mutual learning. Experiments have been
conducted on the benchmark dataset in both the in-target setting and the
cross-target setting. Results demonstrate the superiority of the proposed model
against the state-of-the-art baselines.Comment: 10 pages, 3 figure
Topic Modelling Meets Deep Neural Networks: A Survey
Topic modelling has been a successful technique for text analysis for almost
twenty years. When topic modelling met deep neural networks, there emerged a
new and increasingly popular research area, neural topic models, with over a
hundred models developed and a wide range of applications in neural language
understanding such as text generation, summarisation and language models. There
is a need to summarise research developments and discuss open problems and
future directions. In this paper, we provide a focused yet comprehensive
overview of neural topic models for interested researchers in the AI community,
so as to facilitate them to navigate and innovate in this fast-growing research
area. To the best of our knowledge, ours is the first review focusing on this
specific topic.Comment: A review on Neural Topic Model
Classification aware neural topic model and its application on a new COVID-19 disinformation corpus
The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation category dataset; and 2) a classification-aware neural topic model (CANTM) that combines classification and topic modelling under a variational autoencoder framework. We demonstrate that CANTM efficiently improves classification performance with low resources, and is scalable. In addition, the classification-aware topics help researchers and end-users to better understand the classification results