Data first: turning the digital library "inside out"

Abstract

At Caltech a variety of systems--Invenio, EPrints, Islandora, ArchivesSpace--are used to manage various components of what would ideally be an integrated Digital Library. Given limited resources and very demanding partners we focus on data first, integration and manipulation tools second, and de-emphasize development within any of our repository environments. Instead, data is harvested nightly from all our repositories and made available through a variety of data feeds and web services, structured data files, as well as through traditional web environments. Our aim is to maintain separate systems insofar as they are useful, but to present a wide variety of services that are data- and user-centric, rather than bound to services defined within a specific system’s set of features. Our approach is to focus on data as the central concern, and to think of nightly metadata harvesting and normalization as “continuous migration.” Our second focus is on tools developments. These are typically lightweight, usually command line tools that enable and encourage our users to engage with the data in ways that are specific to their needs. When needed we are moving to providing traditional web-based access through generic content management systems rather than our underlying repositories. The presentation includes some examples of strategies and services to illustrate the benefits of this approach, including a workflow for an international research group that integrates their specific workflow along with normalizing, publishing and preserving data through the Library, and strategies for automatically enriching locally managed data from external sources

    Similar works