Location of Repository

WebBase : A repository of web pages

By Jun Hirai, Sriram Raghavan, Hector Garcia-molina and Andreas Paepcke


In this paper, we study the problem of constructing and maintaining a large shared repository of web pages. We discuss the unique characteristics of such a repository, propose an architecture, and identify its functional modules. We focus on the storage manager module, and illustrate how traditional techniques for storage and indexing can be tailored to meet the requirements of a web repository. To evaluate design alternatives, we also present experimental results from a prototype repository called WebBase, that is currently being developed at Stanford University. Keywords : Repository, WebBase, Architecture, Storage management 1 Introduction A number of important applications require local access to substantial portions of the web. Examples include traditional text search engines [Google] [Avista], related page services [Google] [Alexa], and topic-based search and categorization services [Yahoo]. Such applications typically access, mine or index a local cache or repository of web..

Topics: Repository, WebBase, Architecture, Storage management
Year: 1999
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www-db.stanford.edu/pub... (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.