923 research outputs found

    Digital Preservation Services : State of the Art Analysis

    Get PDF
    Research report funded by the DC-NET project.An overview of the state of the art in service provision for digital preservation and curation. Its focus is on the areas where bridging the gaps is needed between e-Infrastructures and efficient and forward-looking digital preservation services. Based on a desktop study and a rapid analysis of some 190 currently available tools and services for digital preservation, the deliverable provides a high-level view on the range of instruments currently on offer to support various functions within a preservation system.European Commission, FP7peer-reviewe

    Automating Metadata Extraction: Genre Classification

    Get PDF
    A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.

    The LIFE2 final project report

    Get PDF
    Executive summary: The first phase of LIFE (Lifecycle Information For E-Literature) made a major contribution to understanding the long-term costs of digital preservation; an essential step in helping institutions plan for the future. The LIFE work models the digital lifecycle and calculates the costs of preserving digital information for future years. Organisations can apply this process in order to understand costs and plan effectively for the preservation of their digital collections The second phase of the LIFE Project, LIFE2, has refined the LIFE Model adding three new exemplar Case Studies to further build upon LIFE1. LIFE2 is an 18-month JISC-funded project between UCL (University College London) and The British Library (BL), supported by the LIBER Access and Preservation Divisions. LIFE2 began in March 2007, and completed in August 2008. The LIFE approach has been validated by a full independent economic review and has successfully produced an updated lifecycle costing model (LIFE Model v2) and digital preservation costing model (GPM v1.1). The LIFE Model has been tested with three further Case Studies including institutional repositories (SHERPA-LEAP), digital preservation services (SHERPA DP) and a comparison of analogue and digital collections (British Library Newspapers). These Case Studies were useful for scenario building and have fed back into both the LIFE Model and the LIFE Methodology. The experiences of implementing the Case Studies indicated that enhancements made to the LIFE Methodology, Model and associated tools have simplified the costing process. Mapping a specific lifecycle to the LIFE Model isn’t always a straightforward process. The revised and more detailed Model has reduced ambiguity. The costing templates, which were refined throughout the process of developing the Case Studies, ensure clear articulation of both working and cost figures, and facilitate comparative analysis between different lifecycles. The LIFE work has been successfully disseminated throughout the digital preservation and HE communities. Early adopters of the work include the Royal Danish Library, State Archives and the State and University Library, Denmark as well as the LIFE2 Project partners. Furthermore, interest in the LIFE work has not been limited to these sectors, with interest in LIFE expressed by local government, records offices, and private industry. LIFE has also provided input into the LC-JISC Blue Ribbon Task Force on the Economic Sustainability of Digital Preservation. Moving forward our ability to cost the digital preservation lifecycle will require further investment in costing tools and models. Developments in estimative models will be needed to support planning activities, both at a collection management level and at a later preservation planning level once a collection has been acquired. In order to support these developments a greater volume of raw cost data will be required to inform and test new cost models. This volume of data cannot be supported via the Case Study approach, and the LIFE team would suggest that a software tool would provide the volume of costing data necessary to provide a truly accurate predictive model

    Lifecycle information for e-literature: full report from the LIFE project

    Get PDF
    This Report is a record of the LIFE Project. The Project has been run for one year and its aim is to deliver crucial information about the cost and management of digital material. This information should then in turn be able to be applied to any institution that has an interest in preserving and providing access to electronic collections. The Project is a joint venture between The British Library and UCL Library Services. The Project is funded by JISC under programme area (i) as listed in paragraph 16 of the JISC 4/04 circular- Institutional Management Support and Collaboration and as such has set requirements and outcomes which must be met and the Project has done its best to do so. Where the Project has been unable to answer specific questions, strong recommendations have been made for future Project work to do so. The outcomes of this Project are expected to be a practical set of guidelines and a framework within which costs can be applied to digital collections in order to answer the following questions: • What is the long term cost of preserving digital material; • Who is going to do it; • What are the long term costs for a library in HE/FE to partner with another institution to carry out long term archiving; • What are the comparative long-term costs of a paper and digital copy of the same publication; • At what point will there be sufficient confidence in the stability and maturity of digital preservation to switch from paper for publications available in parallel formats; • What are the relative risks of digital versus paper archiving. The Project has attempted to answer these questions by using a developing lifecycle methodology and three diverse collections of digital content. The LIFE Project team chose UCL e-journals, BL Web Archiving and the BL VDEP digital collections to provide a strong challenge to the methodology as well as to help reach the key Project aim of attributing long term cost to digital collections. The results from the Case Studies and the Project findings are both surprising and illuminating

    Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation

    Get PDF
    Digital archiving and preservation are important areas for research and development, but there is no agreed upon set of priorities or coherent plan for research in this area. Research projects in this area tend to be small and driven by particular institutional problems or concerns. As a consequence, proposed solutions from experimental projects and prototypes tend not to scale to millions of digital objects, nor do the results from disparate projects readily build on each other. It is also unclear whether it is worthwhile to seek general solutions or whether different strategies are needed for different types of digital objects and collections. The lack of coordination in both research and development means that there are some areas where researchers are reinventing the wheel while other areas are neglected. Digital archiving and preservation is an area that will benefit from an exercise in analysis, priority setting, and planning for future research. The WG aims to survey current research activities, identify gaps, and develop a white paper proposing future research directions in the area of digital preservation. Some of the potential areas for research include repository architectures and inter-operability among digital archives; automated tools for capture, ingest, and normalization of digital objects; and harmonization of preservation formats and metadata. There can also be opportunities for development of commercial products in the areas of mass storage systems, repositories and repository management systems, and data management software and tools.

    Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries

    Get PDF
    Digital libraries, whether commercial, public or personal, lie at the heart of the information society. Yet, research into their long‐term viability and the meaningful accessibility of their contents remains in its infancy. In general, as we have pointed out elsewhere, ‘after more than twenty years of research in digital curation and preservation the actual theories, methods and technologies that can either foster or ensure digital longevity remain startlingly limited.’ Research led by DigitalPreservationEurope (DPE) and the Digital Preservation Cluster of DELOS has allowed us to refine the key research challenges – theoretical, methodological and technological – that need attention by researchers in digital libraries during the coming five to ten years, if we are to ensure that the materials held in our emerging digital libraries are to remain sustainable, authentic, accessible and understandable over time. Building on this work and taking the theoretical framework of archival science as bedrock, this paper investigates digital preservation and its foundational role if digital libraries are to have long‐term viability at the centre of the global information society.

    Detecting Family Resemblance: Automated Genre Classification.

    Get PDF
    This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
    • …
    corecore