Location of Repository

The Structure of Broad Topics on the Web

By Soumen Chakrabartit, Mukul M. Joshi, Kunal Punera and David M. Pennock

Abstract

The Xeb graph is a giaut social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, arid do not consider textual properties of the riodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the Open Directory provides a useful flamework for understanding the structure of content-based clusters arid communities. In particular, using a topic taxonomy and an automatic classifier, we can measure the background distribution of broad topics on the Web, and analyze the capability of recent random walk algorithms to draw samples which follow such distributions. In addition, we can measm'e the probability that a page about one broad topic will link to another broad topic. Extending this experiment, we can measure how quickly topic context is lost while walking randomly on the Web graph. Estimates of this topic mixing distance may explain why a global PageRank is still meaningful in the context of broad queries. In general, our measurements may prove valuable in the design of community-specific crawlers and link-based ranking systems

Topics: Group and Organization Interfaces, Theory and models, H.1.0 [Information systems, Models arid principles. General terms, Measurements, experimentation. Keywords, Social network analysis, Web bibliometry
Year: 2002
OAI identifier: oai:CiteSeerX.psu:10.1.1.18.9048
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.neci.nec.com/homepa... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.