33 research outputs found
Topic Modelling Meets Deep Neural Networks: A Survey
Topic modelling has been a successful technique for text analysis for almost
twenty years. When topic modelling met deep neural networks, there emerged a
new and increasingly popular research area, neural topic models, with over a
hundred models developed and a wide range of applications in neural language
understanding such as text generation, summarisation and language models. There
is a need to summarise research developments and discuss open problems and
future directions. In this paper, we provide a focused yet comprehensive
overview of neural topic models for interested researchers in the AI community,
so as to facilitate them to navigate and innovate in this fast-growing research
area. To the best of our knowledge, ours is the first review focusing on this
specific topic.Comment: A review on Neural Topic Model
Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions
Lung cancer is responsible for 21% of cancer deaths in the UK and five-year
survival rates are heavily influenced by the stage the cancer was identified
at. Recent studies have demonstrated the capability of AI methods for accurate
and early diagnosis of lung cancer from routine scans. However, this evidence
has not translated into clinical practice with one barrier being a lack of
interpretable models. This study investigates the application Variational
Autoencoders (VAEs), a type of generative AI model, to lung cancer lesions.
Proposed models were trained on lesions extracted from 3D CT scans in the
LIDC-IDRI public dataset. Latent vector representations of 2D slices produced
by the VAEs were explored through clustering to justify their quality and used
in an MLP classifier model for lung cancer diagnosis, the best model achieved
state-of-the-art metrics of AUC 0.98 and 93.1% accuracy. Cluster analysis shows
the VAE latent space separates the dataset of malignant and benign lesions
based on meaningful feature components including tumour size, shape, patient
and malignancy class. We also include a comparative analysis of the standard
Gaussian VAE (GVAE) and the more recent Dirichlet VAE (DirVAE), which replaces
the prior with a Dirichlet distribution to encourage a more explainable latent
space with disentangled feature representation. Finally, we demonstrate the
potential for latent space traversals corresponding to clinically meaningful
feature changes.Comment: 10 pages (main paper), 5 pages (references), 5 figures, 2 tables,
work accepted for BMVC 202
S2vNTM: Semi-supervised vMF Neural Topic Modeling
Language model based methods are powerful techniques for text classification.
However, the models have several shortcomings. (1) It is difficult to integrate
human knowledge such as keywords. (2) It needs a lot of resources to train the
models. (3) It relied on large text data to pretrain. In this paper, we propose
Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these
difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM
leverages the pattern of keywords to identify potential topics, as well as
optimize the quality of topics' keywords sets. Across a variety of datasets,
S2vNTM outperforms existing semi-supervised topic modeling methods in
classification accuracy with limited keywords provided. S2vNTM is at least
twice as fast as baselines.Comment: 17 pages, 9 figures, ICLR Workshop 2023. arXiv admin note: text
overlap with arXiv:2307.0122
Neural Sinkhorn Topic Model
In this paper, we present a new topic modelling approach via the theory of
optimal transport (OT). Specifically, we present a document with two
distributions: a distribution over the words (doc-word distribution) and a
distribution over the topics (doc-topic distribution). For one document, the
doc-word distribution is the observed, sparse, low-level representation of the
content, while the doc-topic distribution is the latent, dense, high-level one
of the same content. Learning a topic model can then be viewed as a process of
minimising the transportation of the semantic information from one distribution
to the other. This new viewpoint leads to a novel OT-based topic modelling
framework, which enjoys appealing simplicity, effectiveness, and efficiency.
Extensive experiments show that our framework significantly outperforms several
state-of-the-art models in terms of both topic quality and document
representations
Hierarchical neural topic modeling with manifold regularization
Topic models have been widely used for learning the latent explainable representation of documents, but most of the existing approaches discover topics in a flat structure. In this study, we propose an effective hierarchical neural topic model with strong interpretability. Unlike the previous neural topic models, we explicitly model the dependency between layers of a network, and then combine latent variables of different layers to reconstruct documents. Utilizing this network structure, our model can extract a tree-shaped topic hierarchy with low redundancy and good explainability by exploiting dependency matrices. Furthermore, we introduce manifold regularization into the proposed method to improve the robustness of topic modeling. Experiments on real-world datasets validate that our model outperforms other topic models in several widely used metrics with much fewer computation costs