Evaluation of Cluster Analysis Methods

Abstract

Cluster analysis includes a range of methods and practices that are used primarily for classification of objects. It takes an important role in many areas. Since the resulting distribution of objects into clusters may vary depending on the selected methods and specifications, it is appropriate to assess the results obtained. This paper proposes new ways of evaluating these results in a situation where objects are characterized by qualitative variables or by variables of different types. These coefficients can be used either to compare different methods (in terms of better outcomes) or for finding of the optimal number of clusters. All of them are based on the detection of variability which is also used for measuring of dissimilarity of objects and clusters. The newly proposed evaluation methods are applied to real data sets (of different sizes, with different number of variables, including variables of different types) and the behavior of these coefficients in different conditions is being examined. These data sets have known as well as unknown classification of objects into clusters. The best coefficient for evaluating clustering results with different types of variables can be considered, based on the analysis carried out, the modified coefficient of CHF. Local maximum value according to which the results of the clustering are evaluated, almost always exists. The analysis has proven that in most cases this value meets the expected results of the well-known classification of objects into clusters. The existence of local extremes of the other coefficients depends on specific data sets and is not always feasible

    Similar works

    Full text

    thumbnail-image