Search CORE

2 research outputs found

Path-based methods on categorical structures for conceptual representation of wikipedia articles

Author
Publication venue: Springer
Publication date
Field of study

Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering

Author: Andrei Popescu-belis
Majid Yazdani
Publication venue
Publication date: 01/01/2011
Field of study

A graph-based distance between Wikipedia articles is defined using a random walk model, which estimates visiting probability (VP) between articles using two types of links: hyperlinks and lexical similarity relations. The VP to and from a set of articles is then computed, and approximations are proposed to make tractable the computation of semantic relatedness between every two texts in a large data set. The model is applied to document clustering on the 20 Newsgroups data set. Precision and recall are improved in comparison with previous textual distance algorithms.

CiteSeerX