51,021 research outputs found
TSDF: A simple yet comprehensive, unified data storage and exchange format standard for digital biosensor data in health applications
Digital sensors are increasingly being used to monitor the change over time
of physiological processes in biological health and disease, often using
wearable devices. This generates very large amounts of digital sensor data, for
which, a consensus on a common storage, exchange and archival data format
standard, has yet to be reached. To address this gap, we propose Time Series
Data Format (TSDF): a unified, standardized format for storing all types of
physiological sensor data, across diverse disease areas. We pose a series of
format design criteria and review in detail current storage and exchange
formats. When judged against these criteria, we find these current formats
lacking, and propose a very simple, intuitive standard for both numerical
sensor data and metadata, based on raw binary data and JSON-format text files,
for sensor measurements/timestamps and metadata, respectively. By focusing on
the common characteristics of diverse biosensor data, we define a set of
necessary and sufficient metadata fields for storing, processing, exchanging,
archiving and reliably interpreting, multi-channel biological time series data.
Our aim is for this standardized format to increase the interpretability and
exchangeability of data, thereby contributing to scientific reproducibility in
studies where digital biosensor data forms a key evidence base
Digital forensics formats: seeking a digital preservation storage format for web archiving
In this paper we discuss archival storage formats from the point of view of digital curation and
preservation. Considering established approaches to data management as our jumping off point, we
selected seven format attributes which are core to the long term accessibility of digital materials.
These we have labeled core preservation attributes. These attributes are then used as evaluation
criteria to compare file formats belonging to five common categories: formats for archiving selected
content (e.g. tar, WARC), disk image formats that capture data for recovery or installation
(partimage, dd raw image), these two types combined with a selected compression algorithm (e.g.
tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for
data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a
general discussion of the file format landscape in terms of the attributes we discuss, and make a
direct comparison between the three most promising archival formats: tar, WARC, and aff. We
conclude by suggesting the next steps to take the research forward and to validate the observations
we have made
Technical alignment
This essay discusses the importance of the areas of
infrastructure and testing to help digital preservation services
demonstrate reliability, transparency, and accountability. It
encourages practitioners to build a strong culture in which
transparency and collaborations between technical frameworks
are valued highly. It also argues for devising and applying
agreed-upon metrics that will enable the systematic analysis of
preservation infrastructure. The essay begins by defining
technical infrastructure and testing in the digital preservation
context, provides case studies that exemplify both progress and
challenges for technical alignment in both areas, and concludes
with suggestions for achieving greater degrees of technical
alignment going forward
Changing Trains at Wigan: Digital Preservation and the Future of Scholarship
This paper examines the impact of the emerging digital landscape on long term access to material created in digital form and its use for research; it examines challenges, risks and expectations.
Curating E-Mails; A life-cycle approach to the management and preservation of e-mail messages
E-mail forms the backbone of communications in many modern institutions and organisations and is a valuable type of organisational, cultural, and historical record. Successful management and preservation of valuable e-mail messages and collections is therefore vital if organisational accountability is to be achieved and historical or cultural memory retained for the future. This requires attention by all stakeholders across the entire life-cycle of the e-mail records.
This instalment of the Digital Curation Manual reports on the several issues involved in managing and curating e-mail messages for both current and future use. Although there is no 'one-size-fits-all' solution, this instalment outlines a generic framework for e-mail curation and preservation, provides a summary of current approaches, and addresses the technical, organisational and cultural challenges to successful e-mail management and longer-term curation.
Transformative Effects of NDIIPP, the Case of the Henry A. Murray Archive
This article comprises reflections on the changes to the Henry A.
Murray Research Archive, catalyzed by involvement with the National
Digital Information Infrastructure and Preservation Program
(NDIIPP) partnership, and the accompanying introduction of next
generation digital library software.
Founded in 1976 at Radcliffe, the Henry A. Murray Research
Archive is the endowed, permanent repository for quantitative and
qualitative research data at the Institute for Quantitative Social Science,
in Harvard University. The Murray preserves in perpetuity all
types of data of interest to the research community, including numerical,
video, audio, interview notes, and other types. The center
is unique among data archives in the United States in the extent
of its holdings in quantitative, qualitative, and mixed quantitativequalitative
research.
The Murray took part in an NDIIPP-funded collaboration
with four other archival partners, Data-PASS, for the purpose of
the identification and acquisition of data at risk, and the joint development
of best practices with respect to shared stewardship,
preservation, and exchange of these data. During this time, the
Dataverse Network (DVN) software was introduced, facilitating
the creation of virtual archives. The combination of institutional
collaboration and new technology lead the Murray to re-engineer
its entire acquisition process; completely rewrite its ingest,
dissemination, and other licensing agreements; and adopt a new
model for ingest, discovery, access, and presentation of its collections.
Through the Data-PASS project, the Murray has acquired a
number of important data collections. The resulting changes
within the Murray have been dramatic, including increasing its
overall rate of acquisitions by fourfold; and disseminating acquisitions
far more rapidly. Furthermore, the new licensing and
processing procedures allow a previously undreamed of level of
interoperability and collaboration with partner archives, facilitating
integrated discovery and presentation services, and joint
stewardship of collections.published or submitted for publicatio
Economics and Engineering for Preserving Digital Content
Progress towards practical long-term preservation seems to be stalled. Preservationists cannot afford specially developed technology, but must exploit what is created for the marketplace.
Economic and technical facts suggest that most preservation ork should be shifted from repository institutions to information producers and consumers. Prior publications describe solutions for all known conceptual challenges of preserving a single digital object, but do not deal with software development or scaling to large collections. Much of the document handling software needed is available. It has, however, not yet been selected, adapted, integrated, or
deployed for digital preservation. The daily tools of both information producers and information consumers can be extended to embed preservation packaging without much burdening these users.
We describe a practical strategy for detailed design and implementation. Document handling is intrinsically complicated because of human sensitivity to communication nuances. Our engineering section therefore starts by discussing how project managers can master the many pertinent details.
Digital Preservation Services : State of the Art Analysis
Research report funded by the DC-NET project.An overview of the state of the art in service provision for digital preservation and curation. Its focus is on the areas where bridging the gaps is needed between e-Infrastructures and efficient and forward-looking digital preservation services. Based on a desktop study and a rapid analysis of some 190 currently available tools and services for digital preservation, the deliverable provides a high-level view on the range of instruments currently on offer to support various functions within a preservation system.European Commission, FP7peer-reviewe
Grid Databases for Shared Image Analysis in the MammoGrid Project
The MammoGrid project aims to prove that Grid infrastructures can be used for
collaborative clinical analysis of database-resident but geographically
distributed medical images. This requires: a) the provision of a
clinician-facing front-end workstation and b) the ability to service real-world
clinician queries across a distributed and federated database. The MammoGrid
project will prove the viability of the Grid by harnessing its power to enable
radiologists from geographically dispersed hospitals to share standardized
mammograms, to compare diagnoses (with and without computer aided detection of
tumours) and to perform sophisticated epidemiological studies across national
boundaries. This paper outlines the approach taken in MammoGrid to seamlessly
connect radiologist workstations across a Grid using an "information
infrastructure" and a DICOM-compliant object model residing in multiple
distributed data stores in Italy and the UKComment: 10 pages, 5 figure
- …