Categorical data clustering constitutes an important part of
data mining; its relevance has recently drawn attention from several researchers.
As a step in data mining, however, clustering encounters the
problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is
done using a structure called CM-tree. In order to test our method, the KModes and Click clustering algorithms were used with several databases.
Experiments demonstrate that the proposed summarization method improves
execution time, without losing clustering quality