Technical documentation of the infrastructure supporting the E-ARK Faceted Query Interface and Application Programming Interface (API)

Abstract

The E-ARK Work package 6 (WP6) - Archival Storage, Services, and Integration, is developing a scalable open-source reference implementation for ingesting, searching, and accessing E-ARK information packages. A major task in this context is the development of a faceted query interface for searching archived content which can be utilized by end-users as well as external software components. The reference implementation aims at providing an archiving and search prototype that is flexible in regard to the type and volume of the ingested payloads. The reference implementation is designed to scale from a single host out to a cluster deployment by employing technologies like Apache Hadoop, Solr, and the Lily repository, supporting different types of input data ranging from text-based files and structured records to office documents and binary content. This report provides technical documentation of the infrastructure supporting the E-ARK Faceted Query Interface and Application Programming Interface (API). It provides a description of the underlying software components utilized for the development of the search functionality of the E-ARK reference implementation and discusses the required interactions to work as an integrated solution. Furthermore, technical documentation of the developed software and system configuration is provided. The document describes also methods to customize the faceted query interface and provides examples for its utilization

    Similar works