28 research outputs found

    DePICT : a conceptual model for digital preservation

    Get PDF
    Digital Preservation addresses a significant threat to our cultural and economic foundation: the loss of access to valuable and, sometimes, unique information that is captured in digital form through obsolescence, deterioration or loss of information of how to access the contents. Digital Preservation has been defined as “The series of managed activities necessary to ensure continued access to digital materials for as long as necessary” (Jones, Beagrie, 2001/2008). This thesis develops a conceptual model of the core concepts and constraints that appear in digital preservation - DePICT (Digital PreservatIon ConceptualisaTion). This includes a conceptual model of the digital preservation domain, a top-level vocabulary for the concepts in the model, an in-depth analysis of the role of digital object properties, characteristics, and the constraints that guide digital preservation processes, and of how properties, characteristics and constraints affect the interaction of digital preservation services. In addition, it presents a machine-interpretable XML representation of this conceptual model to support automated digital preservation tools. Previous preservation models have focused on preserving technical properties of digital files. Such an approach limits the choices of preservation actions and does not fully reflect preservation activities in practice. Organisations consider properties that go beyond technical aspects and that encompass a wide range of factors that influence and guide preservation processes, including organisational, legal, and financial ones. Consequently, it is necessary to be able to handle ‘digital’ objects in a very wide sense, including abstract objects, such as intellectual entities and collections, in addition to the files and sets of files that create renditions of logical objects that are normally considered. In addition, we find that not only the digital objects' properties, but also the properties of the environments in which they exist, guide digital preservation processes. Furthermore, organisations use risk-based analysis for their preservation strategies, policies and preservation planning. They combine information about risks with an understanding of actions that are expected to mitigate the risks. Risk and action specifications can be dependent on properties of the actions, as well as on properties of objects or environments which form the input and output of those actions. The model presented here supports this view explicitly. It links risks with the actions that mitigate them and expresses them in stakeholder specific constraints. Risk, actions and constraints are top-level entities in this model. In addition, digital objects and environments are top-level entities on an equal level. Models that do not have this property limit the choice of preservation actions to ones that transform a file in order to mitigate a risk. Establishing environments as top-level entities enables us to treat risks to objects, environments, or a combination of both. The DePICT model is the first conceptual model in the Digital Preservation domain that supports a comprehensive, whole life-cycle approach for dynamic, interacting preservation processes, rather than taking the customary and more limited view that is concerned with the management of digital objects once they are stored in a long-term repository.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Developing a Robust Migration Workflow for Preserving and Curating Hand-held Media

    Full text link
    Many memory institutions hold large collections of hand-held media, which can comprise hundreds of terabytes of data spread over many thousands of data-carriers. Many of these carriers are at risk of significant physical degradation over time, depending on their composition. Unfortunately, handling them manually is enormously time consuming and so a full and frequent evaluation of their condition is extremely expensive. It is, therefore, important to develop scalable processes for stabilizing them onto backed-up online storage where they can be subject to highquality digital preservation management. This goes hand in hand with the need to establish efficient, standardized ways of recording metadata and to deal with defective data-carriers. This paper discusses processing approaches, workflows, technical set-up, software solutions and touches on staffing needs for the stabilization process. We have experimented with different disk copying robots, defined our metadata, and addressed storage issues to scale stabilization to the vast quantities of digital objects on hand-held data-carriers that need to be preserved. Working closely with the content curators, we have been able to build a robust data migration workflow and have stabilized over 16 terabytes of data in a scalable and economical manner.Comment: 11 pages, presented at iPres 2011. Also publishing in corresponding conference proceeding

    Implementing Metadata that Guide Digital Preservation Services

    Get PDF
    Effective digital preservation depends on a set of preservation services that work together to ensure that digital objects can be preserved for the long-term. These services need digital preservation metadata, in particular, descriptions of the properties that digital objects may have and descriptions of the requirements that guide digital preservation services. This paper analyzes how these services interact and use these metadata and develops a data dictionary to support them

    Modelling Organizational Preservation Goals to Guide Digital Preservation

    Get PDF
    This paper is an extended and updated version of the work reported at iPres 2008. Digital preservation activities can only succeed if they go beyond the technical properties of digital objects. They must consider the strategy, policy, goals, and constraints of the institution that undertakes them and take into account the cultural and institutional framework in which data, documents and records are preserved. Furthermore, because organizations differ in many ways, a one-size-fits-all approach cannot be appropriate. Fortunately, organizations involved in digital preservation have created documents describing their policies, strategies, work-flows, plans, and goals to provide guidance. They also have skilled staff who are aware of sometimes unwritten considerations. Within Planets (Farquhar & Hockx-Yu, 2007), a four-year project co-funded by the European Union to address core digital preservation challenges, we have analyzed preservation guiding documents and interviewed staff from libraries, archives, and data centres that are actively engaged in digital preservation. This paper introduces a conceptual model for expressing the core concepts and requirements that appear in preservation guiding documents. It defines a specific vocabulary that institutions can reuse for expressing their own policies and strategies. In addition to providing a conceptual framework, the model and vocabulary support automated preservation planning tools through an XML representation

    Services that Enable Integration and Cross-Linking Across Different Types of Identifiers and Data Types

    Get PDF
    This report summarises progress for disciplinary cross-linking of identifier systems and the results obtained from the perspective of each THOR project partner organisation, in particular disciplinary data repositories. We describe requirements, results, and challenges informed by implementations in the life sciences, earth and environmental sciences, and high-energy physics

    SIP Draft Specification

    Get PDF
    A Submission Information Package (SIP) is defined in the OAIS standard1 as an Information Package that is delivered by the Producer to the OAIS for use in the construction or update of one or more AIPs and/or the associated Descriptive Information. Many different SIP formats are used all over the world and unfortunately there is currently no central format for a SIP which would cover all individual national and business needs identified in the E-ARK Report on Available Best Practices. Therefore, the main objective of this report is to describe a draft SIP specification for the E-ARK project – give an overview of the structure and main metadata elements for E-ARK SIP and provide initial input for the technical implementations of E-ARK ingest tools. The target group of this work are E-ARK project partners as well as all other archival institutions and software providers creating or updating their SIP format specifications. This report provides an overview of: • The general structure for submission information packages. This report explains how the E-ARK SIP is constructed by following the common rules for all other (archival, dissemination) information packages. • The SIP METS Profile. We provide a detailed overview of metadata sections and the metadata elements in these sections. The table with all metadata elements could possibly be of interest to technical stakeholders who wish to continue with the more detailed work of the E-ARK SIP implementation later. Two examples with different kinds of content (MoReq2010, SIARD-E) following the common structure for EARK submission information package can be found in the appendixes to this report

    Records export, transfer and ingest recommendations and SIP Creation Tools

    Get PDF
    This report describes a software deliverable as it delivers a number of E-ARK tools: • ERMS Export Module (a tool for exporting records and their metadata from ERMS in a controlled manner); • Database Preservation Toolkit (a tool for exporting relational databases as SIARD 2.0 or other formats); • ESSArch Tools for Producer (a tool for SIP creation); • ESSArch Tools for Archive (a tool for SIP ingestion); • RODA-in (a tool for SIP creation); • Universal Archiving Module (a tool for SIP creation). In addition, an overview of Pre-Ingest and Ingest processes will be provided by this report which will help to understand the tools and their use

    SMURF (Semantically Marked Up Record Format) Profile

    Get PDF
    The purpose of this report is to describe SMURF (semantically marked up record format) profile, which includes ERMS (electronic records management systems) and SFSB (simple file-system based) records as described below. When extracting information from a producer’s system one has the choice of two generic options: 1. Extracting data in a relational database structure Extracting data from a relational database into a long-term preservation format (SIARD) that preserves the properties of the relational database so that the data can be imported into a relational database management system (RDBMS) on Access. Access can happen via database queries or via a search field. The main access use cases are: a. The producer wishes to retrieve their data for business purposes and/or re-use. b. The consumer wishes to consult the data for purposes of research. c. The archivist wishes to retrieve the data for professional treatment: to check and, if necessary perform preservation actions, etc. More information about this option can be read in the SIARD 2.0 Profile Specification. 2. Extracting data and metadata as records Extract the records and normalise them to a standard E-ARK XML format. This means that the records are semantically marked up using metadata. Being technically valid and complying with this specification makes them directly accessible for validation, data management, indexing and searching. Their structured semantic metadata description is explicit rather than hidden inside a RDBS. The representation of descriptive metadata inside the archive can be in the E-ARK SMURF AIP format and/or another native archive format. The main advantages over the RDBS representation are that: o Records from different sources can be merged. o Search and access is possible across all records from all sources. o Records can be managed and accessed uniformly. o The original database / records system software does not need to be licensed and preserved

    Deal with conflict, capture the relationship: the case of digital object properties

    No full text
    Properties of digital objects play a central role in digital preservation. All key preservation services are linked via a common understanding of the properties which describe the digital objects in a repository's care. Unfortunately, different services deal with properties on sometimes different levels of description. While, for example, a preservation characterization service may extract the fontSize of a string, the preservation planning service may require the preservation of the text’s formatting. Additionally, a value for the same property may be obtained in various ways, sometimes resulting in different observed values. Furthermore, properties are not always equally applicable across different file formats. This report investigates where in these three situations relationships between properties need to be defined to overcome possible misalignments. The analysis was based on observations gained during a case study of the nature of the properties that are captured in different institutions’ preservation requirements and those of use in Planets preservation services

    DEAL WITH CONFLICT, CAPTURE THE RELATIONSHIP: THE CASE OF DIGITAL OBJECT PROPERTIES: Paper - iPres 2010 - Vienna

    No full text
    Properties of digital objects play a central role in digital preservation. All key preservation services are linked via a common understanding of the properties which describe the digital objects in a repository's care. Unfortunately, different services deal with properties on sometimes different levels of description. While, for example, a preservation characterization service may extract the fontSize of a string, the preservation planning service may require the preservation of the text’s formatting. Additionally, a value for the same property may be obtained in various ways, sometimes resulting in different observed values. Furthermore, properties are not always equally applicable across different file formats. This report investigates where in these three situations relationships between properties need to be defined to overcome possible misalignments. The analysis was based on observations gained during a case study of the nature of the properties that are captured in different institutions’ preservation requirements and those of use in Planets preservation services
    corecore