Skip to main content
Article thumbnail
Location of Repository

WebSim: A Novel Term Similarity Metric based on a Web Search Technology

By Seokkyung Chung, Jongeun Jun and Dennis Mcleod


Abstract. Given that pairwise similarity computations are essential in ontology learning and data mining, we propose WebSim (Web-based term Similarity metric), whose feature extraction and similarity model is based on a conventional Web search engine. There are two main aspects that we can benefit from utilizing a Web search engine. First, we can obtain the freshest content for each term that represents the up-to-date knowledge on the term. This is particularly useful for dynamic ontology management in that ontologies must evolve with time as new concepts or terms appear. Second, in comparison with the approaches that use the certain amount of crawled Web documents as corpus, our method is less sensitive to the problem of data sparseness because we access as much content as possible using a search engine. At the core of WebSim, we present two different methodologies for similarity computation, a mutual information based metric and a feature-based metric. Moreover, we show how WebSim can be utilized for modifying existing ontologies. Finally, we demonstrate the characteristics of WebSim by coupling with WordNet. Experimental results show that WebSim can uncover topical relations between terms that are not shown in conventional concept-based ontologies.

Year: 2009
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.