slides

A comparison of Gap statistic definitions with and without logarithm function

Abstract

The Gap statistic is a standard method for determining the number of clusters in a set of data. The Gap statistic standardizes the graph of log(Wk)\log(W_{k}), where WkW_{k} is the within-cluster dispersion, by comparing it to its expectation under an appropriate null reference distribution of the data. We suggest to use WkW_{k} instead of log(Wk)\log(W_{k}), and to compare it to the expectation of WkW_{k} under a null reference distribution. In fact, whenever a number fulfills the original Gap statistic inequality, this number also fulfills the inequality of a Gap statistic using WkW_{k}, but not \textit{vice versa}. The two definitions of the Gap function are evaluated on several simulated data sets and on a real data of DCE-MR images

    Similar works