The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service

Abstract

CERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that enables researchers to conveniently preserver their valuable research. In this thesis we investigate a possible strategy for building a CERN wide archiving service using an existing preservation tool, Archivematica. Building an archival service at CERN scale has at least three challenges. 1) The amount of data: CERN currently stores more than 300PB of data. 2) Preservation of versioned data: research is often a series of small, but important changes. This history needs to be preserved without duplicating very large datasets. 3) The variety of systems and workflows: with more than 17,500 researchers the preservation platform needs to integrate with many different workflows and content delivery systems. The main objective of this research is to evaluate if Archivematica can be used as the main component of a digital archiving service at CERN. We discuss how we created a distributed deployment of Archivematica and increased our video processing capacity from 2.5 terabytes per month to approximately 15 terabytes per month. We present a strategy for preserving versioned research data without creating duplicate artefacts. Finally, we evaluate three methods for integrating Archivematica with digital repositories and other digital workflows

    Similar works

    Full text

    thumbnail-image

    Available Versions