Search CORE

33 research outputs found

Topic Modelling Meets Deep Neural Networks: A Survey

Author: Buntine Wray
Du Lan
Huynh Viet
Jin Yuan
Phung Dinh
Zhao He
Publication venue
Publication date: 01/01/2021
Field of study

Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, summarisation and language models. There is a need to summarise research developments and discuss open problems and future directions. In this paper, we provide a focused yet comprehensive overview of neural topic models for interested researchers in the AI community, so as to facilitate them to navigate and innovate in this fast-growing research area. To the best of our knowledge, ours is the first review focusing on this specific topic.Comment: A review on Neural Topic Model

arXiv.org e-Print Archive

Monash University Research Portal

Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

Author: Jayne David
Keel Benjamin
Quyn Aaron
Relton Samuel D.
Publication venue
Publication date: 27/11/2023
Field of study

Lung cancer is responsible for 21% of cancer deaths in the UK and five-year survival rates are heavily influenced by the stage the cancer was identified at. Recent studies have demonstrated the capability of AI methods for accurate and early diagnosis of lung cancer from routine scans. However, this evidence has not translated into clinical practice with one barrier being a lack of interpretable models. This study investigates the application Variational Autoencoders (VAEs), a type of generative AI model, to lung cancer lesions. Proposed models were trained on lesions extracted from 3D CT scans in the LIDC-IDRI public dataset. Latent vector representations of 2D slices produced by the VAEs were explored through clustering to justify their quality and used in an MLP classifier model for lung cancer diagnosis, the best model achieved state-of-the-art metrics of AUC 0.98 and 93.1% accuracy. Cluster analysis shows the VAE latent space separates the dataset of malignant and benign lesions based on meaningful feature components including tumour size, shape, patient and malignancy class. We also include a comparative analysis of the standard Gaussian VAE (GVAE) and the more recent Dirichlet VAE (DirVAE), which replaces the prior with a Dirichlet distribution to encourage a more explainable latent space with disentangled feature representation. Finally, we demonstrate the potential for latent space traversals corresponding to clinically meaningful feature changes.Comment: 10 pages (main paper), 5 pages (references), 5 figures, 2 tables, work accepted for BMVC 202

arXiv.org e-Print Archive

S2vNTM: Semi-supervised vMF Neural Topic Modeling

Author: Desai Jay
Iannacci Francis
Jiang Xiaoyu
Sengamedu Srinivasan
Xu Weijie
Publication venue
Publication date: 06/07/2023
Field of study

Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. (1) It is difficult to integrate human knowledge such as keywords. (2) It needs a lot of resources to train the models. (3) It relied on large text data to pretrain. In this paper, we propose Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM leverages the pattern of keywords to identify potential topics, as well as optimize the quality of topics' keywords sets. Across a variety of datasets, S2vNTM outperforms existing semi-supervised topic modeling methods in classification accuracy with limited keywords provided. S2vNTM is at least twice as fast as baselines.Comment: 17 pages, 9 figures, ICLR Workshop 2023. arXiv admin note: text overlap with arXiv:2307.0122

arXiv.org e-Print Archive

Neural Sinkhorn Topic Model

Author: Buntine Wray
Huynh Viet
Le Trung
Phung Dinh
Zhao He
Publication venue
Publication date: 12/08/2020
Field of study

In this paper, we present a new topic modelling approach via the theory of optimal transport (OT). Specifically, we present a document with two distributions: a distribution over the words (doc-word distribution) and a distribution over the topics (doc-topic distribution). For one document, the doc-word distribution is the observed, sparse, low-level representation of the content, while the doc-topic distribution is the latent, dense, high-level one of the same content. Learning a topic model can then be viewed as a process of minimising the transportation of the semantic information from one distribution to the other. This new viewpoint leads to a novel OT-based topic modelling framework, which enjoys appealing simplicity, effectiveness, and efficiency. Extensive experiments show that our framework significantly outperforms several state-of-the-art models in terms of both topic quality and document representations

arXiv.org e-Print Archive

Monash University Research Portal

Hierarchical neural topic modeling with manifold regularization

Author: Chen Ziye
Cheng Gary
Ding Cheng
Rao Yanghui
Tao Xiaohui
Wang Fu Lee
Xie Haoran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/10/2021
Field of study

Topic models have been widely used for learning the latent explainable representation of documents, but most of the existing approaches discover topics in a flat structure. In this study, we propose an effective hierarchical neural topic model with strong interpretability. Unlike the previous neural topic models, we explicitly model the dependency between layers of a network, and then combine latent variables of different layers to reconstruct documents. Utilizing this network structure, our model can extract a tree-shaped topic hierarchy with low redundancy and good explainability by exploiting dependency matrices. Furthermore, we introduce manifold regularization into the proposed method to improve the robustness of topic modeling. Experiments on real-world datasets validate that our model outperforms other topic models in several widely used metrics with much fewer computation costs

University of Southern Queensland ePrints