3 research outputs found
Clustering with missing data: which equivalent for Rubin's rules?
Multiple imputation (MI) is a popular method for dealing with missing values.
However, the suitable way for applying clustering after MI remains unclear: how
to pool partitions? How to assess the clustering instability when data are
incomplete? By answering both questions, this paper proposed a complete view of
clustering with missing data using MI. The problem of partitions pooling is
here addressed using consensus clustering while, based on the bootstrap theory,
we explain how to assess the instability related to observed and missing data.
The new rules for pooling partitions and instability assessment are
theoretically argued and extensively studied by simulation. Partitions pooling
improves accuracy while measuring instability with missing data enlarges the
data analysis possibilities: it allows assessment of the dependence of the
clustering to the imputation model, as well as a convenient way for choosing
the number of clusters when data are incomplete, as illustrated on a real data
set.Comment: 39 page