Harvest: A Scalable, Customizable Discovery and Access System

Abstract

Rapid growth in data volume, user base, and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides an integrated set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet. The system interoperates with WWW clients and with HTTP,FTP, Gopher, and NetNews information resources. We discuss the design and implementation of Harvest and its subsystems, give examples of its uses, and provide measurements indicating that Harvest can significantly reduce server load, network traffic, and space requirements when building indexes, compared with previous systems. We also discuss several popular indexes wehave built using Harvest, underscoring the customizability and scalability of the system

    Similar works

    Full text

    thumbnail-image

    Available Versions