Skip to main content
Article thumbnail
Location of Repository

Active learning of constraints for semi-supervised clustering



Graduation date: 2013Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are selected based on the current clustering solution and the existing constraint set. We apply a general framework that builds on the concept of neighborhood, where neighborhoods contain "labeled examples" of different clusters according to the pairwise constraints. Our active learning method expands the neighborhoods by selecting informative points and querying their relationship with the neighborhoods. Under this framework, we build on the classic uncertainty-based principle and present a novel approach for computing the uncertainty associated with each data point. We further introduce a selection criterion that trades-off the amount of uncertainty of each data point with the expected number of queries (the cost) required to resolve this uncertainty. This allows us to select queries that have the highest information rate. We evaluate the proposed method on the benchmark datasets and the results demonstrate consistent and substantial improvements over the current state-of-the-art

Topics: Active Learning, Semi-supervised Clustering
Year: 2013
OAI identifier:
Provided by: ScholarsArchive@OSU

Suggested articles


  1. (2010). A SAT-based framework for efficient constrained clustering.
  2. (2012). Active Clustering of Biological Sequences.
  3. (2005). Active constrained clustering by examining spectral eigenvectors.
  4. (2010). Active learning by querying informative and representative examples.
  5. (2010). Active learning literature survey.
  6. (1996). Active learning with statistical models.
  7. (2008). Active query selection for semisupervised clustering.
  8. (2004). Active semi-supervision for pairwise constrained clustering.
  9. (2006). Batch mode active learning and its application to medical image classification.
  10. (1995). Breast cancer diagnosis and prognosis via linear programming.
  11. (2009). Clustering Ensembles with Active Constraints. Applications of Supervised and Unsupervised Ensemble Methods,
  12. (2008). Constrained Clustering: Advances in Algorithms, Theory, and Applications.
  13. (2007). Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering.
  14. (2007). Discriminative batch mode active learning.
  15. (2007). Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection.
  16. (2004). Integrating constraints and metric learning in semi-supervised clustering.
  17. (2006). Measuring constraint-set utility for partitional clustering algorithms. Knowledge Discovery in Databases,
  18. (2001). Random forests.
  19. (2003). RF/tools: a class of two-eyed algorithms.
  20. (2007). Semi-supervised Document Clustering via Active Learning with Pairwise Constraints.
  21. (2008). Semi-supervised SVM batch mode active learning for image retrieval.
  22. (2011). Spectral Clustering on a Budget.
  23. (2010). UCI machine learning repository. http://,
  24. (2006). Unsupervised learning with random forest predictors.
  25. (2004). Using diversity in cluster ensembles.
  26. (2011). Wekaut, a modified version of weka.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.