390,304 research outputs found
Variable selection for model-based clustering using the integrated complete-data likelihood
Variable selection in cluster analysis is important yet challenging. It can
be achieved by regularization methods, which realize a trade-off between the
clustering accuracy and the number of selected variables by using a lasso-type
penalty. However, the calibration of the penalty term can suffer from
criticisms. Model selection methods are an efficient alternative, yet they
require a difficult optimization of an information criterion which involves
combinatorial problems. First, most of these optimization algorithms are based
on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are
often greedy because they need multiple calls of EM algorithms. Here we propose
to use a new information criterion based on the integrated complete-data
likelihood. It does not require any estimate and its maximization is simple and
computationally efficient. The original contribution of our approach is to
perform the model selection without requiring any parameter estimation. Then,
parameter inference is needed only for the unique selected model. This approach
is used for the variable selection of a Gaussian mixture model with conditional
independence assumption. The numerical experiments on simulated and benchmark
datasets show that the proposed method often outperforms two classical
approaches for variable selection.Comment: submitted to Statistics and Computin
- …