11,290 research outputs found
Digital forensics formats: seeking a digital preservation storage format for web archiving
In this paper we discuss archival storage formats from the point of view of digital curation and
preservation. Considering established approaches to data management as our jumping off point, we
selected seven format attributes which are core to the long term accessibility of digital materials.
These we have labeled core preservation attributes. These attributes are then used as evaluation
criteria to compare file formats belonging to five common categories: formats for archiving selected
content (e.g. tar, WARC), disk image formats that capture data for recovery or installation
(partimage, dd raw image), these two types combined with a selected compression algorithm (e.g.
tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for
data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a
general discussion of the file format landscape in terms of the attributes we discuss, and make a
direct comparison between the three most promising archival formats: tar, WARC, and aff. We
conclude by suggesting the next steps to take the research forward and to validate the observations
we have made
Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation
Digital archiving and preservation are important areas for research and development, but there is no agreed upon set of priorities or coherent plan for research in this area. Research projects in this area tend to be small and driven by particular institutional problems or concerns. As a consequence, proposed solutions from experimental projects and prototypes tend not to scale to millions of digital objects, nor do the results from disparate projects readily build on each other. It is also unclear whether it is worthwhile to seek general solutions or whether different strategies are needed for different types of digital objects and collections. The lack of coordination in both research and development means that there are some areas where researchers are reinventing the wheel while other areas are neglected.
Digital archiving and preservation is an area that will benefit from an exercise in analysis, priority setting, and planning for future research. The WG aims to survey current research activities, identify gaps, and develop a white paper proposing future research directions in the area of digital preservation. Some of the potential areas for research include repository architectures and inter-operability among digital archives; automated tools for capture, ingest, and normalization of digital objects; and harmonization of preservation formats and metadata. There can also be opportunities for development of commercial products in the areas of mass storage systems, repositories and repository management systems, and data management software and tools.
BlogForever: D3.1 Preservation Strategy Report
This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design
BlogForever D5.1: Design and Specification of Case Studies
This document presents the specification and design of six case studies for testing the BlogForever platform implementation process. The report explains the data collection plan where users of the repository will provide usability feedback through questionnaires as well as details of scalability analysis through the creation of specific log files analytics. The case studies will investigate the sustainability of the platform, that it meets potential usersâ needs and that is has an important long term impact
Digital curation and the cloud
Digital curation involves a wide range of activities, many of which could benefit from cloud
deployment to a greater or lesser extent. These range from infrequent, resource-intensive tasks
which benefit from the ability to rapidly provision resources to day-to-day collaborative activities
which can be facilitated by networked cloud services. Associated benefits are offset by risks
such as loss of data or service level, legal and governance incompatibilities and transfer
bottlenecks. There is considerable variability across both risks and benefits according to the
service and deployment models being adopted and the context in which activities are
performed. Some risks, such as legal liabilities, are mitigated by the use of alternative, e.g.,
private cloud models, but this is typically at the expense of benefits such as resource elasticity
and economies of scale. Infrastructure as a Service model may provide a basis on which more
specialised software services may be provided.
There is considerable work to be done in helping institutions understand the cloud and its
associated costs, risks and benefits, and how these compare to their current working methods,
in order that the most beneficial uses of cloud technologies may be identified. Specific
proposals, echoing recent work coordinated by EPSRC and JISC are the development of
advisory, costing and brokering services to facilitate appropriate cloud deployments, the
exploration of opportunities for certifying or accrediting cloud preservation providers, and
the targeted publicity of outputs from pilot studies to the full range of stakeholders within the
curation lifecycle, including data creators and owners, repositories, institutional IT support
professionals and senior manager
BlogForever D5.2: Implementation of Case Studies
This document presents the internal and external testing results for the BlogForever case studies. The evaluation of the BlogForever implementation process is tabulated under the most relevant themes and aspects obtained within the testing processes. The case studies provide relevant feedback for the sustainability of the platform in terms of potential usersâ needs and relevant information on the possible long term impact
Big Data Privacy Context: Literature Effects On Secure Informational Assets
This article's objective is the identification of research opportunities in
the current big data privacy domain, evaluating literature effects on secure
informational assets. Until now, no study has analyzed such relation. Its
results can foster science, technologies and businesses. To achieve these
objectives, a big data privacy Systematic Literature Review (SLR) is performed
on the main scientific peer reviewed journals in Scopus database. Bibliometrics
and text mining analysis complement the SLR. This study provides support to big
data privacy researchers on: most and least researched themes, research
novelty, most cited works and authors, themes evolution through time and many
others. In addition, TOPSIS and VIKOR ranks were developed to evaluate
literature effects versus informational assets indicators. Secure Internet
Servers (SIS) was chosen as decision criteria. Results show that big data
privacy literature is strongly focused on computational aspects. However,
individuals, societies, organizations and governments face a technological
change that has just started to be investigated, with growing concerns on law
and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions
and the only consistent country between literature and SIS adoption is the
United States. Countries in the lowest ranking positions represent future
research opportunities.Comment: 21 pages, 9 figure
CHORUS Deliverable 4.5: Report of the 3rd CHORUS Conference
The third and last CHORUS conference on Multimedia Search Engines took place from the 26th to the 27th of May 2009 in Brussels, Belgium. About 100 participants from 15 European countries, the US, Japan and Australia learned about the latest developments in the domain. An exhibition of 13 stands presented 16 research projects currently ongoing around the
world
- âŠ