Skip to main content
Article thumbnail
Location of Repository

Entropy-based criterion in categorical clustering

By Tao Li, Sheng Ma and Mitsunori Ogihara


The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measures between data values. This is often the case in many applications where data is described by a set of descriptive or binary attributes, many of which are not numerical. Examples of such include the country of origin and the color of eyes in demographic data. Entropy-type measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropy-based criterion in clustering categorical data. It first shows that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity coefficients. An iterative Monte-Carlo procedure is then presented to search for the partitions minimizing the criterion. Experiments are conducted to show the effectiveness of the proposed procedure. 1

Year: 2004
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.