3 research outputs found

    The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service

    No full text
    CERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that enables researchers to conveniently preserver their valuable research. In this thesis we investigate a possible strategy for building a CERN wide archiving service using an existing preservation tool, Archivematica. Building an archival service at CERN scale has at least three challenges. 1) The amount of data: CERN currently stores more than 300PB of data. 2) Preservation of versioned data: research is often a series of small, but important changes. This history needs to be preserved without duplicating very large datasets. 3) The variety of systems and workflows: with more than 17,500 researchers the preservation platform needs to integrate with many different workflows and content delivery systems. The main objective of this research is to evaluate if Archivematica can be used as the main component of a digital archiving service at CERN. We discuss how we created a distributed deployment of Archivematica and increased our video processing capacity from 2.5 terabytes per month to approximately 15 terabytes per month. We present a strategy for preserving versioned research data without creating duplicate artefacts. Finally, we evaluate three methods for integrating Archivematica with digital repositories and other digital workflows

    EOSC-IF / Interoperability Guideline: Research Product Deposition

    No full text
    Bardi A, Manghi P, Gonzalez Lopez JB, et al. EOSC-IF / Interoperability Guideline: Research Product Deposition.Open Science calls for researchers to publish as soon as possible any type of research product in such a way their research activity can be transparently assessed, reviewed, reproduced, and rewarded in all its aspects. However, the publishing process has become more and more a burden for scientists, who must, most of the time, spend time to publish their articles, data, software, and other products in the many institutional or thematic repositories of reference. Scenarios include first-time publishing of new resource products or double-publishing of research products, to satisfy institutional mandates and community practices. Such tedious work is often incomplete, with some products ending up unpublished and others showing incomplete or imprecise metadata. Some communities investigated and realised the integration of their research performing services, from research infrastructures and clusters, with repositories for research product deposition. The integration ensures that outcomes of such services are deposited automatically, prior authorization of the users, into a given repository, giving life to an end-to-end scientific workflow, from experimentation to publishing. The limit of existing approaches is to be bound to a specific repository API and format; introducing multiple repositories as potential targets of deposition for the service, multiplies the problem, as bilateral interactions with the respective repository API must be established. For example, the Zenodo deposition API and the B2SHARE API are similar but different in many ways; a service willing to automate publishing into either repositories would require implementing and maintaining two different workflows. For the EOSC to act as enabler for Open Science practices, its Interoperability Framework should guide services of research infrastructures and clusters of the EOSC on how to implement (semi-)automated workflows for the deposition and consumption of research products. To support different integration options, two modalities are supported by these guidelines: SWORD protocol v3 for push mode and a combination of COAR Notify and Signposting for pull mode. The EOSC guidelines for research product onboarding are suggested as metadata exchange format

    EOSC IF Interoperability Guideline: Access to content via PID

    No full text
    Bardi A, Manghi P, Gonzalez Lopez JB, et al. EOSC IF Interoperability Guideline: Access to content via PID.An important aspect of Open Science is the possibility to re-use existing research products (e.g. research data), deposited in repositories and accessible via their persistent identifiers (e.g. handle, doi, ark). However, there is no standard way a service can access the actual content behind persistent identifiers, as these typically resolve to the landing pages of the research products. The lack of standard for accessing the actual content identified by persistent identifiers makes the automatic consumption of research products hardly implementable and, when possible, limited to the persistent identifiers issued by a specific repository (e.g. the first prototype of the EGI Data Transfer Service integrated in the EOSC EXPLORE portal supported only DOIs from Zenodo). The EOSC Future Working Group on Research Product Publishing proposes the adoption of the Publication Boundary Pattern of the SignPosting protocol and recomends it for inclusion as interoperability guideline in the EOSC IF
    corecore