1,018 research outputs found
Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications
This paper presents a novel pairwise constraint propagation approach by
decomposing the challenging constraint propagation problem into a set of
independent semi-supervised learning subproblems which can be solved in
quadratic time using label propagation based on k-nearest neighbor graphs.
Considering that this time cost is proportional to the number of all possible
pairwise constraints, our approach actually provides an efficient solution for
exhaustively propagating pairwise constraints throughout the entire dataset.
The resulting exhaustive set of propagated pairwise constraints are further
used to adjust the similarity matrix for constrained spectral clustering. Other
than the traditional constraint propagation on single-source data, our approach
is also extended to more challenging constraint propagation on multi-source
data where each pairwise constraint is defined over a pair of data points from
different sources. This multi-source constraint propagation has an important
application to cross-modal multimedia retrieval. Extensive results have shown
the superior performance of our approach.Comment: The short version of this paper appears as oral paper in ECCV 201
Clustering documents with active learning using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%
- ā¦