88 research outputs found

    A Framework for Verifying the Fixity of Archived Web Resources

    Get PDF
    The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web resources have not been altered, they should have access to fixity information associated with these resources. However, most web archives do not allow accessing fixity information and, more importantly, even if fixity information is available, it is provided by the same archive delivering the resource, not by an independent archive or service. In this research, we present a framework for establishing and checking the fixity on the playback of archived resources, or mementos. The framework defines an archive-aware hashing function that consists of several guidelines for generating repeatable fixity information on the playback of mementos. These guidelines are results of our 14-month study for identifying and quantifying changes in replayed mementos over time that affect generating repeatable fixity information. Changes on the playback of mementos may be caused by JavaScript, transient errors, inconsistency in the availability of mementos over time, and archive-specific resources. Changes are also caused by transformations in the content of archived resources applied by web archives to appropriately replay these resources in a user’s browser. The study also shows that only 11.55% of mementos always produce the same fixity information after each replay, while about 16.06% of mementos always produce different fixity information after each replay. The remaining 72.39% of mementos produce multiple unique fixity information. We also find that mementos may disappear when web archives move to different domains or archives. In addition to defining multiple guidelines for generating fixity information, the framework introduces two approaches, Atomic and Block, that can be used to disseminate fixity information to web archives. The main difference between the two approaches is that, in the Atomic approach, the fixity information of each archived web page is stored in a separate file before being disseminated to several on-demand web archives, while in the Block approach, we batch together fixity information of multiple archived pages to a single binary-searchable file before being disseminated to archives. The framework defines the structure of URLs used to publish fixity information on the web and retrieve archived fixity information from web archives. Our framework does not require changes in the current web archiving infrastructure, and it is built based on well-known web archiving standards, such as the Memento protocol. The proposed framework will allow users to generate fixity information on any archived page at any time, preserve the fixity information independently from the archive delivering the archived page, and verify the fixity of the archived page at any time in the future

    Hashes Are Not Suitable to Verify Fixity of the Public Archived Web

    Get PDF
    Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the hash values generated on the same resource are identical, then the fixity of the resource is verified. We tested this process by conducting a study on 16,627 mementos from 17 public web archives. We replayed and downloaded the mementos 39 times using a headless browser over a period of 442 days and generated a hash for each memento after each download, resulting in 39 hashes per memento. The hash is calculated by including not only the content of the base HTML of a memento but also all embedded resources, such as images and style sheets. We expected to always observe the same hash for a memento regardless of the number of downloads. However, our results indicate that 88.45% of mementos produce more than one unique hash value, and about 16% (or one in six) of those mementos always produce different hash values. We identify and quantify the types of changes that cause the same memento to produce different hashes. These results point to the need for defining an archive-aware hashing function, as conventional hashing functions are not suitable for replayed archived web pages

    STFC Centre for Environmental Data Archival (CEDA) Annual Report 2013 (April 2012-March 2013)

    Get PDF
    The mission of the Centre for Environmental Archival (CEDA) is to deliver long term curation of scientifically important environmental data at the same time as facilitating the use of data by the environmental science community. CEDA was established by the amalgamation of the activities of two of the Natural Environment Research Council (NERC) designated data centres: the British Atmospheric Data Centre, and the NERC Earth Observation Data Centre. We are pleased to present here our fourth annual report, covering activities for the 2013 year (April 2012 to March 2013). The report consists of two sections and appendices, the first section broadly providing a summary of activities and some statistics with some short descriptions of some significant activities, and a second section introducing some exemplar projects and activities. The report concludes with additional details of activities such as publications, software maintained etc

    The Future(s) of Web Archive Research Across Ireland.

    Get PDF
    The central aim of this thesis is to investigate the current state of web archive research in Ireland in line with international developments. Integrating desk research, survey studies, and case studies, and using a combination of research methods, qualitative and quantitative, drawn from disciplines across the humanities and information sciences, this thesis focuses on bridging the gaps between the creation of web archives and the use of archived web materials for current and future research in an Irish context. The thesis describes web archive research to be representative of the web archiving life cycle model (Bragg & Hanna, 2013) which is inclusive of appraisal, selection, capture, storage, quality assurance, preservation and maintenance, replay/playback, access, use, and reuse. Through a synthesis of relevant literature, the thesis examines the causes for the loss of digital heritage and how this relates to Ireland and explores the challenges for participation in web archive research from creation to end use. A survey study is used to explore the challenges for the creation and use of web archives, and the overlaps, and intersections of such challenges across communities of practice within web archive research. A qualitative survey is used to provide an overview of the availability and accessibility of web archives based in Ireland, and their usefulness as resources for conducting research on Irish topics. It further discusses the influence of copyright and legal deposit legislation, or lack thereof, on their abilities to preserve digital heritage for future generations. An online survey is used to investigate awareness of, and engagement/non-engagement with, web archives as resources for research in Irish academic institutions. Overall, the findings show that due to advances in internet, web, and software technologies, there is a need for the continual evaluation of skills, tools, and methods associated with the full web archiving lifecycle. As technologies keep evolving, so too will the challenges. The findings also highlight the need for creators and users/researchers to keep moving forward as collaborators to guide the next generation of web archive research. At the same time, there is also the need for the continual evaluation of legal deposit legislation in line with the fragility of born digital heritage and the technological advances in publishing and communication technologies

    Settling for limited privacy: how much does it help?

    Get PDF
    This thesis explores practical and theoretical aspects of several privacy-providing technologies, including tools for anonymous web-browsing, verifiable electronic voting schemes, and private information retrieval from databases. State-of-art privacy-providing schemes are frequently impractical for implementational reasons or for sheer information-theoretical reasons due to the amount of information that needs to be transmitted. We have been researching the question of whether relaxing the requirements on such schemes, in particular settling for imperfect but sufficient in real-world situations privacy, as opposed to perfect privacy, may be helpful in producing more practical or more efficient schemes. This thesis presents three results. The first result is the introduction of caching as a technique for providing anonymous web-browsing at the cost of sacrificing some functionality provided by anonymizing systems that do not use caching. The second result is a coercion-resistant electronic voting scheme with nearly perfect privacy and nearly perfect voter verifiability. The third result consists of some lower bounds and some simple upper bounds on the amount of communication in nearly private information retrieval schemes; our work is the first in-depth exploration of private information schemes with imperfect privacy

    Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)

    Get PDF
    The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities

    Proceedings of the NSSDC Conference on Mass Storage Systems and Technologies for Space and Earth Science Applications

    Get PDF
    The proceedings of the National Space Science Data Center Conference on Mass Storage Systems and Technologies for Space and Earth Science Applications held July 23 through 25, 1991 at the NASA/Goddard Space Flight Center are presented. The program includes a keynote address, invited technical papers, and selected technical presentations to provide a broad forum for the discussion of a number of important issues in the field of mass storage systems. Topics include magnetic disk and tape technologies, optical disk and tape, software storage and file management systems, and experiences with the use of a large, distributed storage system. The technical presentations describe integrated mass storage systems that are expected to be available commercially. Also included is a series of presentations from Federal Government organizations and research institutions covering their mass storage requirements for the 1990's

    Motion picture film as a government record: framing films within archival theory and preparing for the digital future

    Get PDF
    Governments have created and used motion picture films since soon after their invention, but government archivists have an uneasy relationship with films. Historically, the traditional archival literature has overlooked films in favor of a focus on textual records, while the film archive literature is unconcerned with the archival concept of the record. To define the scope of the problem, this thesis demonstrates the paucity of archival literature addressing motion picture film as a government record. Moving forward, motion pictures are examined through a lens of archival theory and set in their rightful place among other formats of government records. It is concluded that while films must be read differently than textual records, they provide evidence, information, memory, and other affordances as found in other record formats. Finally, the thesis explores the properties that must be maintained as a film record is migrated to digital formats in order to ensure that it remains a valid record. It is argued that failing to create an authentic digital record during preservation digitization of a film is the same as deaccessioning the film record from the archival collection. Government archivists have a responsibility to carry out a thorough and documented reappraisal process before such actions may be taken
    • …
    corecore