Location of Repository

Research Track Paper Constraint-Driven Clustering

By Rong Ge, Martin Ester, Wen Jin and Ian Davidson

Abstract

Clustering methods can be either data-driven or need-driven. Data-driven methods intend to discover the true structure of the underlying data while need-driven methods aims at organizing the true structure to meet certain application requirements. Thus, need-driven (e.g. constrained) clustering is able to find more useful and actionable clusters in applications such as energy aware sensor networks, privacy preservation, and market segmentation. However, the existing methods of constrained clustering require users to provide the number of clusters, which is often unknown in advance, but has a crucial impact on the clustering result. In this paper, we argue that a more natural way to generate actionable clusters is to let the application-specific constraints decide the number of clusters. For this purpose, we introduce a novel cluster model, Constraint-Driven Clustering (CDC), which finds an a priori unspecified number of compact clusters that satisfy all user-provided constraints. Two general types of constraints are considered, i.e. minimum significance constraints and minimum variance constraints, as well as combinations of these two types. We prove the NP-hardness of the CDC problem with different constraints. We propose a novel dynamic data structure, the CD-Tree, which organizes data points in leaf nodes such that each leaf node approximately satisfies the CDC constraints and minimizes the objective function. Based on CD-Trees, we develop an efficient algorithm to solve the new clustering problem. Our experimental evaluation on synthetic and real datasets demonstrates the quality of the generated clusters and the scalability of the algorithm

Topics: Categories and Subject Descriptors H.2.8 [Database Applications, Data mining General Terms Algorithms
Year: 2014
OAI identifier: oai:CiteSeerX.psu:10.1.1.409.969
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.ucdavis.edu/~dav... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.