175 research outputs found

    Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries

    Get PDF
    Digital libraries, whether commercial, public or personal, lie at the heart of the information society. Yet, research into their long‐term viability and the meaningful accessibility of their contents remains in its infancy. In general, as we have pointed out elsewhere, ‘after more than twenty years of research in digital curation and preservation the actual theories, methods and technologies that can either foster or ensure digital longevity remain startlingly limited.’ Research led by DigitalPreservationEurope (DPE) and the Digital Preservation Cluster of DELOS has allowed us to refine the key research challenges – theoretical, methodological and technological – that need attention by researchers in digital libraries during the coming five to ten years, if we are to ensure that the materials held in our emerging digital libraries are to remain sustainable, authentic, accessible and understandable over time. Building on this work and taking the theoretical framework of archival science as bedrock, this paper investigates digital preservation and its foundational role if digital libraries are to have long‐term viability at the centre of the global information society.

    Changing Trains at Wigan: Digital Preservation and the Future of Scholarship

    Get PDF
    This paper examines the impact of the emerging digital landscape on long term access to material created in digital form and its use for research; it examines challenges, risks and expectations.

    Establishing a community-based approach to electronic journal archiving: the UK LOCKSS Pilot Programme

    Get PDF
    Lots of Copies Keep Stuff Safe (LOCKSS ) represents a sophisticated combination of technical and business-aware elements that can be deployed to ensure the long-term accessibility to electronic journal content even if the publisher ceases to exist, a subscription is terminated, or the already acquired content becomes damaged. Given the potential benefits of LOCKSS to the UK community, and in consideration of the implications of the NESLi2 licences, the Joint Information Systems Committee and the Consortium of University Research Libraries (JISC/CURL) co-funded a UK LOCKSS Pilot Programme to explore issues associated with the practical implementation of LOCKSS in UK Higher Education institutions. The pilot launched in March 2006 and concluded in July 2008. Following on from our experiences throughout the UK LOCKSS Pilot Programme, this paper discusses the organizational attributes of the LOCKSS approach that we expect to further develop in the UK, describes the types of journal content that the current generation of LOCKSS seems best suited to handle and as a result how LOCKSS may fit into the broader journal archiving environment, and it describes the steps we are taking to ensure both the LOCKSS software and Technical Support Service grow effectively to support library use and information management

    The Wiltshire Wills Feasibility Study

    Get PDF
    The Wiltshire and Swindon Record Office has nearly ninety thousand wills in its care. These records are neither adequately catalogued nor secured against loss by facsimile microfilm copies. With support from the Heritage Lottery Fund the Record Office has begun to produce suitable finding aids for the material. Beginning with this feasibility study the Record Office is developing a strategy to ensure the that facsimiles to protect the collection against risk of loss or damage and to improve public access are created.<p></p> This feasibility study explores the different methodologies that can be used to assist the preservation and conservation of the collection and improve public access to it. The study aims to produce a strategy that will enable the Record Office to create digital facsimiles of the Wills in its care for access purposes and to also create preservation quality microfilms. The strategy aims to seek the most cost effective and time efficient approach to the problem and identifies ways to optimise the processes by drawing on the experience of other similar projects. This report provides a set of guidelines and recommendations to ensure the best use of the resources available for to provide the most robust preservation strategy and to ensure that future access to the Wills as an information resource can be flexible, both local and remote, and sustainable

    Audit and Certification of Digital Repositories: Creating a Mandate for the Digital Curation Centre (DCC)

    Get PDF
    The article examines the issues surrounding the audit and certification of digital repositories in light of the work that the RLG/NARA Task Force did to draw up guidelines and the need for these guidelines to be validated.

    Detecting Family Resemblance: Automated Genre Classification.

    Get PDF
    This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

    Automating Metadata Extraction: Genre Classification

    Get PDF
    A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.

    Examining Variations of Prominent Features in Genre Classification.

    Get PDF
    This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. In this paper we use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification.

    The Role of Evidence in Establishing Trust in Repositories

    Get PDF
    This article arises from work by the Digital Curation Centre (DCC) Working Group examining mechanisms to roll out audit and certification services for digital repositories in the United Kingdom. Our attempt to develop a program for applying audit and certification processes and tools took as its starting point the RLG-NARA Audit Checklist for Certifying Digital Repositories. Our intention was to appraise critically the checklist and conceive a means of applying its mechanics within a diverse range of repository environments. We were struck by the realization that while a great deal of effort has been invested in determining the characteristics of a 'trusted digital repository', far less effort has concentrated on the ways in which the presence of the attributes can be demonstrated and their qualities measured. With this in mind we sought to explore the role of evidence within the certification process, and to identify examples of the types of evidence (e.g., documentary, observational, and testimonial) that might be desirable during the course of a repository audit.

    Feature Type Analysis in Automated Genre Classification

    Get PDF
    In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.
    corecore