Skip to main content
Article thumbnail
Location of Repository

Semi-supervised Clustering Using Bayesian Regularization

By Zuobing Xu and Ram Akella

Abstract

Text clustering is most commonly treated as a fully automated task without user supervision. However, we can improve clustering performance using supervision in the form of pairwise (must-link and cannot-link) constraints. This paper introduces a rigorous Bayesian framework for semi-supervised clustering which incorporates human supervision in the form of pairwise constraints both in the expectation step and maximization step of the EM algorithm. During the expectation step, we model the pairwise constraints as random variables, which enable us to capture the uncertainty in constraints in a principled manner. During the maximization step, we treat the constraint documents as prior information, and adjust the probability mass of model distribution to emphasize words occurring in constraint documents by using Bayesian regularization. Bayesian conjugate prior modeling makes the maximization step more efficient than gradient search methods in the traditional distance learning. Experimental results on several text datasets demonstrate significant advantages over existing algorithms.

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.5963
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.soe.ucsc.edu/~zbxu/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.