Lung cancer is responsible for 21% of cancer deaths in the UK and five-year
survival rates are heavily influenced by the stage the cancer was identified
at. Recent studies have demonstrated the capability of AI methods for accurate
and early diagnosis of lung cancer from routine scans. However, this evidence
has not translated into clinical practice with one barrier being a lack of
interpretable models. This study investigates the application Variational
Autoencoders (VAEs), a type of generative AI model, to lung cancer lesions.
Proposed models were trained on lesions extracted from 3D CT scans in the
LIDC-IDRI public dataset. Latent vector representations of 2D slices produced
by the VAEs were explored through clustering to justify their quality and used
in an MLP classifier model for lung cancer diagnosis, the best model achieved
state-of-the-art metrics of AUC 0.98 and 93.1% accuracy. Cluster analysis shows
the VAE latent space separates the dataset of malignant and benign lesions
based on meaningful feature components including tumour size, shape, patient
and malignancy class. We also include a comparative analysis of the standard
Gaussian VAE (GVAE) and the more recent Dirichlet VAE (DirVAE), which replaces
the prior with a Dirichlet distribution to encourage a more explainable latent
space with disentangled feature representation. Finally, we demonstrate the
potential for latent space traversals corresponding to clinically meaningful
feature changes.Comment: 10 pages (main paper), 5 pages (references), 5 figures, 2 tables,
work accepted for BMVC 202