983 research outputs found
Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia
In this paper we investigate the nature and structure of the relation between
imposed classifications and real clustering in a particular case of a
scale-free network given by the on-line encyclopedia Wikipedia. We find a
statistical similarity in the distributions of community sizes both by using
the top-down approach of the categories division present in the archive and in
the bottom-up procedure of community detection given by an algorithm based on
the spectral properties of the graph. Regardless the statistically similar
behaviour the two methods provide a rather different division of the articles,
thereby signaling that the nature and presence of power laws is a general
feature for these systems and cannot be used as a benchmark to evaluate the
suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl
Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications
This paper presents a novel pairwise constraint propagation approach by
decomposing the challenging constraint propagation problem into a set of
independent semi-supervised learning subproblems which can be solved in
quadratic time using label propagation based on k-nearest neighbor graphs.
Considering that this time cost is proportional to the number of all possible
pairwise constraints, our approach actually provides an efficient solution for
exhaustively propagating pairwise constraints throughout the entire dataset.
The resulting exhaustive set of propagated pairwise constraints are further
used to adjust the similarity matrix for constrained spectral clustering. Other
than the traditional constraint propagation on single-source data, our approach
is also extended to more challenging constraint propagation on multi-source
data where each pairwise constraint is defined over a pair of data points from
different sources. This multi-source constraint propagation has an important
application to cross-modal multimedia retrieval. Extensive results have shown
the superior performance of our approach.Comment: The short version of this paper appears as oral paper in ECCV 201
Generalized Optimization Framework for Graph-based Semi-supervised Learning
We develop a generalized optimization framework for graph-based
semi-supervised learning. The framework gives as particular cases the Standard
Laplacian, Normalized Laplacian and PageRank based methods. We have also
provided new probabilistic interpretation based on random walks and
characterized the limiting behaviour of the methods. The random walk based
interpretation allows us to explain di erences between the performances of
methods with di erent smoothing kernels. It appears that the PageRank based
method is robust with respect to the choice of the regularization parameter and
the labelled data. We illustrate our theoretical results with two realistic
datasets, characterizing di erent challenges: Les Miserables characters social
network and Wikipedia hyper-link graph. The graph-based semi-supervised
learning classi- es the Wikipedia articles with very good precision and perfect
recall employing only the information about the hyper-text links
- …