Visualization and unsupervised clustering of emphysema progression using t-SNE analysis of longitudinal CT images and SNPs

Abstract

Chronic obstructive pulmonary disease (COPD) is predicted to become the third leading cause of death worldwide by 2030. A longitudinal study using CT scans of COPD is useful to assess the changes in structural abnormalities. In this study, we performed visualization and unsupervised clustering of emphysema progression using t-distributed stochastic neighbor embedding (t-SNE) analysis of longitudinal CT images, smoking history, and SNPs. The procedure of this analysis is as follows: (1) automatic segmentation of lung lobes using 3D U-Net, (2) quantitative image analysis of emphysema progression in lung lobes, and (3) visualization and unsupervised clustering of emphysema progression using t-SNE. Nine explanatory variables were used for the clustering: genotypes at two SNPs (rs13180 and rs3923564), smoking history (smoking years, number of cigarettes per day, pack-year), and LAV distribution (LAV size and density in upper lobes, LAV size, and density in lower lobes). The objective variable was emphysema progression which was defined as the annual change in low attenuation volume (LAV%/year) using linear regression. The nine-dimensional space was transformed to two-dimensional space by t-SNE, and divided into three clusters by Gaussian mixture model. This method was applied to 37 smokers with 68.2 pack-years and 97 past smokers with 51.1 pack-years. The results demonstrated that this method could be effective for quantitative assessment of emphysema progression by SNPs, smoking history, and imaging features

    Similar works