Search CORE

983 research outputs found

Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia

Author: Caldarelli G.
Capocci A.
Rao F.
Publication venue: 'IOP Publishing'
Publication date: 16/10/2007
Field of study

In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless the statistically similar behaviour the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

IMT Institutional Repository

Spectral Clustering Wikipedia Keyword-Based Search Results

Author: Julian Szymański
Tomasz Dziubich
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Frontiers - Publisher Connector

Approaches for enriching and improving textual knowledge bases

Author: Fetahu Besnik
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2017
Field of study

[no abstract

arXiv.org e-Print Archive

Institutionelles Repositorium der Leibniz Universität Hannover

Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications

Author: A Ng
A Oliva
B Ghanem
C Carson
C Snoek
D Blei
D Zhou
E Bruno
G Chen
H Hotelling
J Li
J Shi
L Hubert
N Rasiwasia
P Lancaster
R Bartels
S Yu
U Luxburg von
V Ordonez
Yuxin Peng
Z Lu
Zhiwu Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2011
Field of study

This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised learning subproblems which can be solved in quadratic time using label propagation based on k-nearest neighbor graphs. Considering that this time cost is proportional to the number of all possible pairwise constraints, our approach actually provides an efficient solution for exhaustively propagating pairwise constraints throughout the entire dataset. The resulting exhaustive set of propagated pairwise constraints are further used to adjust the similarity matrix for constrained spectral clustering. Other than the traditional constraint propagation on single-source data, our approach is also extended to more challenging constraint propagation on multi-source data where each pairwise constraint is defined over a pair of data points from different sources. This multi-source constraint propagation has an important application to cross-modal multimedia retrieval. Extensive results have shown the superior performance of our approach.Comment: The short version of this paper appears as oral paper in ECCV 201

arXiv.org e-Print Archive

Crossref

Generalized Optimization Framework for Graph-based Semi-supervised Learning

Author: Avrachenkov Konstantin
Gonçalves Paulo
Mishenin Alexey
Sokol Marina
Publication venue
Publication date: 19/10/2011
Field of study

We develop a generalized optimization framework for graph-based semi-supervised learning. The framework gives as particular cases the Standard Laplacian, Normalized Laplacian and PageRank based methods. We have also provided new probabilistic interpretation based on random walks and characterized the limiting behaviour of the methods. The random walk based interpretation allows us to explain di erences between the performances of methods with di erent smoothing kernels. It appears that the PageRank based method is robust with respect to the choice of the regularization parameter and the labelled data. We illustrate our theoretical results with two realistic datasets, characterizing di erent challenges: Les Miserables characters social network and Wikipedia hyper-link graph. The graph-based semi-supervised learning classi- es the Wikipedia articles with very good precision and perfect recall employing only the information about the hyper-text links

arXiv.org e-Print Archive

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot