Project GRACE: A grid based search tool for the global digital library

Abstract

The paper will report on the progress of an ongoing EU project called GRACE - Grid Search and Categorization Engine (http://www.grace-ist.org). The project participants are CERN, Sheffield Hallam University, Stockholm University, Stuttgart University, GL 2006 and Telecom Italia. The project started in 2002 and will finish in 2005, resulting in a Grid based search engine that will search across a variety of content sources including a number of electronic thesis and dissertation repositories. The Open Archives Initiative (OAI) is expanding and is clearly an interesting movement for a community advocating open access to ETD. However, the OAI approach alone may not be sufficiently scalable to achieve a truly global ETD Digital Library. Many universities simply offer their collections to the world via their local web services without being part of any federated system for archiving and even those dissertations that are provided with OAI compliant metadata will not necessarily be picked up by a centralized OAI Service Provider as the collection might not be officially registered as an OAI data provider. GRACE is an attempt to apply an innovative Grid-based solution that will meet the challenges of searching a global heterogeneous collection of documents. The goal of the project is to build a distributed search and categorization engine that will run on the European Data Grid (EDG) and its successor, the Enabling Grids for E-science in Europe (EGEE). The main difference between GRACE and existing search engines is that GRACE has no centralized index. Instead, it will rely on local indexes or search interfaces that are dispersed across web services around the world. These local sources can use different protocols including http, OAI-PMH and Z39.50. In order to include and index even document collections offering no local search possibilities at all, GRACE will use a native search engine based on Lucene. This decentralized approach, along with the scalable processing power provided by the Grid will result in the following advantages to users: Advanced search capabilities which are flexible enough to allow the broadest possible features given the content sources selected for searching. Increased currency of information and indexes. On-the-fly categorization of documents: the search engine will be capable of dynamically categorizing documents but will also work with existing meta-data and thesauri when desired. Multiple languages for searching and result presentation (starting with English, Italian, Swedish and German). Both anonymous and registered users. Collaboration: Documents or collections shared by registered users or groups. The list of contents sources that the GRACE engine will search is still being developed, but thesis collections in Germany, Sweden and Switzerland are already included. The list will be expanded to include other sources as soon as the tool is up and running. This paper will be a description of the search tool as well as an invitation to collaboration

    Similar works

    Full text

    thumbnail-image

    Available Versions