Location of Repository

Inverted Index Compression via Online Document Routing

By Gal Lavee, Ronny Lempel, Edo Liberty and Oren Somekh

Abstract

Modern search engines are expected to make documents searchable shortly after they appear on the ever changing Web. To satisfy this requirement, the Web is frequently crawled. Due to the sheer size of their indexes, search engines distribute the crawled documents among thousands of servers in a scheme called local index-partitioning, such that each server indexes only several million pages. To ensure documents from the same host (e.g., www.nytimes.com) are distributed uniformly over the servers, for load balancing purposes, random routing of documents to servers is common. To expedite the time documents become searchable after being crawled, documents may be simply appended to the existing index partitions. However, indexing by merely appendingdocuments, results in larger indexsizes since documen

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.188.5208
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.yale.edu/homes/e... (external link)
  • www.nytimes.com) (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.