51,021 research outputs found

    TSDF: A simple yet comprehensive, unified data storage and exchange format standard for digital biosensor data in health applications

    Full text link
    Digital sensors are increasingly being used to monitor the change over time of physiological processes in biological health and disease, often using wearable devices. This generates very large amounts of digital sensor data, for which, a consensus on a common storage, exchange and archival data format standard, has yet to be reached. To address this gap, we propose Time Series Data Format (TSDF): a unified, standardized format for storing all types of physiological sensor data, across diverse disease areas. We pose a series of format design criteria and review in detail current storage and exchange formats. When judged against these criteria, we find these current formats lacking, and propose a very simple, intuitive standard for both numerical sensor data and metadata, based on raw binary data and JSON-format text files, for sensor measurements/timestamps and metadata, respectively. By focusing on the common characteristics of diverse biosensor data, we define a set of necessary and sufficient metadata fields for storing, processing, exchanging, archiving and reliably interpreting, multi-channel biological time series data. Our aim is for this standardized format to increase the interpretability and exchangeability of data, thereby contributing to scientific reproducibility in studies where digital biosensor data forms a key evidence base

    Digital forensics formats: seeking a digital preservation storage format for web archiving

    Get PDF
    In this paper we discuss archival storage formats from the point of view of digital curation and preservation. Considering established approaches to data management as our jumping off point, we selected seven format attributes which are core to the long term accessibility of digital materials. These we have labeled core preservation attributes. These attributes are then used as evaluation criteria to compare file formats belonging to five common categories: formats for archiving selected content (e.g. tar, WARC), disk image formats that capture data for recovery or installation (partimage, dd raw image), these two types combined with a selected compression algorithm (e.g. tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a general discussion of the file format landscape in terms of the attributes we discuss, and make a direct comparison between the three most promising archival formats: tar, WARC, and aff. We conclude by suggesting the next steps to take the research forward and to validate the observations we have made

    Technical alignment

    Get PDF
    This essay discusses the importance of the areas of infrastructure and testing to help digital preservation services demonstrate reliability, transparency, and accountability. It encourages practitioners to build a strong culture in which transparency and collaborations between technical frameworks are valued highly. It also argues for devising and applying agreed-upon metrics that will enable the systematic analysis of preservation infrastructure. The essay begins by defining technical infrastructure and testing in the digital preservation context, provides case studies that exemplify both progress and challenges for technical alignment in both areas, and concludes with suggestions for achieving greater degrees of technical alignment going forward

    Changing Trains at Wigan: Digital Preservation and the Future of Scholarship

    Get PDF
    This paper examines the impact of the emerging digital landscape on long term access to material created in digital form and its use for research; it examines challenges, risks and expectations.

    Curating E-Mails; A life-cycle approach to the management and preservation of e-mail messages

    Get PDF
    E-mail forms the backbone of communications in many modern institutions and organisations and is a valuable type of organisational, cultural, and historical record. Successful management and preservation of valuable e-mail messages and collections is therefore vital if organisational accountability is to be achieved and historical or cultural memory retained for the future. This requires attention by all stakeholders across the entire life-cycle of the e-mail records. This instalment of the Digital Curation Manual reports on the several issues involved in managing and curating e-mail messages for both current and future use. Although there is no 'one-size-fits-all' solution, this instalment outlines a generic framework for e-mail curation and preservation, provides a summary of current approaches, and addresses the technical, organisational and cultural challenges to successful e-mail management and longer-term curation.

    Transformative Effects of NDIIPP, the Case of the Henry A. Murray Archive

    Get PDF
    This article comprises reflections on the changes to the Henry A. Murray Research Archive, catalyzed by involvement with the National Digital Information Infrastructure and Preservation Program (NDIIPP) partnership, and the accompanying introduction of next generation digital library software. Founded in 1976 at Radcliffe, the Henry A. Murray Research Archive is the endowed, permanent repository for quantitative and qualitative research data at the Institute for Quantitative Social Science, in Harvard University. The Murray preserves in perpetuity all types of data of interest to the research community, including numerical, video, audio, interview notes, and other types. The center is unique among data archives in the United States in the extent of its holdings in quantitative, qualitative, and mixed quantitativequalitative research. The Murray took part in an NDIIPP-funded collaboration with four other archival partners, Data-PASS, for the purpose of the identification and acquisition of data at risk, and the joint development of best practices with respect to shared stewardship, preservation, and exchange of these data. During this time, the Dataverse Network (DVN) software was introduced, facilitating the creation of virtual archives. The combination of institutional collaboration and new technology lead the Murray to re-engineer its entire acquisition process; completely rewrite its ingest, dissemination, and other licensing agreements; and adopt a new model for ingest, discovery, access, and presentation of its collections. Through the Data-PASS project, the Murray has acquired a number of important data collections. The resulting changes within the Murray have been dramatic, including increasing its overall rate of acquisitions by fourfold; and disseminating acquisitions far more rapidly. Furthermore, the new licensing and processing procedures allow a previously undreamed of level of interoperability and collaboration with partner archives, facilitating integrated discovery and presentation services, and joint stewardship of collections.published or submitted for publicatio

    Economics and Engineering for Preserving Digital Content

    Get PDF
    Progress towards practical long-term preservation seems to be stalled. Preservationists cannot afford specially developed technology, but must exploit what is created for the marketplace. Economic and technical facts suggest that most preservation ork should be shifted from repository institutions to information producers and consumers. Prior publications describe solutions for all known conceptual challenges of preserving a single digital object, but do not deal with software development or scaling to large collections. Much of the document handling software needed is available. It has, however, not yet been selected, adapted, integrated, or deployed for digital preservation. The daily tools of both information producers and information consumers can be extended to embed preservation packaging without much burdening these users. We describe a practical strategy for detailed design and implementation. Document handling is intrinsically complicated because of human sensitivity to communication nuances. Our engineering section therefore starts by discussing how project managers can master the many pertinent details.

    Digital Preservation Services : State of the Art Analysis

    Get PDF
    Research report funded by the DC-NET project.An overview of the state of the art in service provision for digital preservation and curation. Its focus is on the areas where bridging the gaps is needed between e-Infrastructures and efficient and forward-looking digital preservation services. Based on a desktop study and a rapid analysis of some 190 currently available tools and services for digital preservation, the deliverable provides a high-level view on the range of instruments currently on offer to support various functions within a preservation system.European Commission, FP7peer-reviewe

    Grid Databases for Shared Image Analysis in the MammoGrid Project

    Full text link
    The MammoGrid project aims to prove that Grid infrastructures can be used for collaborative clinical analysis of database-resident but geographically distributed medical images. This requires: a) the provision of a clinician-facing front-end workstation and b) the ability to service real-world clinician queries across a distributed and federated database. The MammoGrid project will prove the viability of the Grid by harnessing its power to enable radiologists from geographically dispersed hospitals to share standardized mammograms, to compare diagnoses (with and without computer aided detection of tumours) and to perform sophisticated epidemiological studies across national boundaries. This paper outlines the approach taken in MammoGrid to seamlessly connect radiologist workstations across a Grid using an "information infrastructure" and a DICOM-compliant object model residing in multiple distributed data stores in Italy and the UKComment: 10 pages, 5 figure
    corecore