4 research outputs found

    Smart Objects and Open Archives

    Get PDF
    Within the context of digital libraries (DLs), we are making information objects first-class citizens . We decouple information objects from the systems used for their storage and retrieval, allowing the technology for both DLs and information content to progress independently. We believe dismantling the stovepipe of DL-archive-content is the first step in building richer DL experiences for users and insuring the long-term survivability of digital information. To demonstrate this partitioning between DLs, archives and information content, we introduce buckets : aggregative, intelligent, object-oriented constructs for publishing in digital libraries. Buckets exist within the Smart Object, Dumb Archive (SODA) DL model, which promotes the importance and responsibility of individual information objects and reduces the role of traditional archives and database systems. The goal is to have smart objects be independent of and more resilient to the transient nature of information systems. The SODA model fits well with the emerging Open Archives Initiative (OAI), which promotes DL interoperability through the use of simple archives. This paper examines the motivation for buckets, SODA and the OAI, and initial experiences using them in various DL testbeds

    Archive Ingest and Handling Test

    Get PDF
    The Archive Ingest and Handling Test (AIHT) was a Library of Congress (LC) sponsored research project administered by Information Systems and Support Inc. (ISS). The project featured five participants: Old Dominion University Computer Science Department; Harvard University Library; Johns Hopkins University Library; Stanford University Library; Library of Congress. All five participants received identical disk drives containing copies of the 911.gmu.edu web site, a collection of 9/11 materials maintained by George Mason University (GMU). The purpose of the AIHT experiment was to perform archival forensics to determine the nature of the archive, ingest it, simulate at least one of the file formats going out of scope, export a copy of the archive, and import another version of the archive. The AIHT is further described in Shirky (2005)

    CRATE: A Simple Model for Self-Describing Web Resources

    Get PDF
    If not for the Internet Archiveā€™s eļ¬€orts to store periodic snapshots of the web, many sites would not have any preservation prospects at all. The barrier to entry is too high for everyday web sites, which may have skilled webmasters managing them, but which lack skilled archivists to preserve them. Digital preservation is not easy. One problem is the complexity of preservation models, which have speciļ¬c meta-data and structural requirements. Another problem is the time and eļ¬€ort it takes to properly prepare digital resources for preservation in the chosen model. In this paper, we propose a simple preservation model called a CRATE, a complex-object consisting of undiļ¬€erentiated metadata and the resource byte stream. We describe the CRATE complex object and compare it with other complex-object models. Our target is the everyday, personal, departmental, or community web site where a long-term preservation strategy does not yet exist

    A Framework for Web Object Self-Preservation

    Get PDF
    We propose and develop a framework based on emergent behavior principles for the long-term preservation of digital data using the web infrastructure. We present the development of the framework called unsupervised small-world (USW) which is at the nexus of emergent behavior, graph theory, and digital preservation. The USW algorithm creates graph based structures on the Web used for preservation of web objects (WOs). Emergent behavior activities, based on Craig Reynoldsā€™ ā€œboidsā€ concept, are used to preserve WOs without the need for a central archiving authority. Graph theory is extended by developing an algorithm that incrementally creates small-world graphs. Graph theory provides a foundation to discuss the vulnerability of graphs to different types of failures and attack profiles. Investigation into the robustness and resilience of USW graphs lead to the development of a metric to quantify the effect of damage inflicted on a graph. The metric remains valid whether the graph is connected or not. Different USW preservation policies are explored within a simulation environment where preservation copies have to be spread across hosts. Spreading the copies across hosts helps to ensure that copies will remain available even when there is a concerted effort to remove all copies of a USW component. A moderately aggressive preservation policy is the most effective at making the best use of host and network resources. Our efforts are directed at answering the following research questions: 1. Can web objects (WOs) be constructed to outlive the people and institutions that created them? We have developed, analyzed, tested through simulations, and developed a reference implementation of the unsupervised small-world (USW) algorithm that we believe will create a connected network of WOs based on the web infrastructure (WI) that will outlive the people and institutions that created the WOs. The USW graph will outlive its creators by being robust and continuing to operate when some of its WOs are lost, and it is resilient and will recover when some of its WOs are lost. 2. Can we leverage aspects of naturally occurring networks and group behavior for preservation? We used Reynoldsā€™ tenets for ā€œboidsā€ to guide our analysis and development of the USW algorithm. The USW algorithm allows a WO to ā€œexploreā€ a portion of the USW graph before making connections to members of the graph and before making preservation copies across the ā€œdiscoveredā€ graph. Analysis and simulation show that the USW graph has an average path length (L(G)) and clustering coefficient (C(G)) values comparable to small-world graphs. A high C(G) is important because it reflects how likely it is that a WO will be able spread copies to other domains, thereby increasing its likelihood of long term survival. A short L(G) is important because it means that a WO will not have to look too far to identify new candidate preservation domains, if needed. Small-world graphs occur in nature and are thus believed to be robust and resilient. The USW algorithms use these small-world graph characteristics to spread preservation copies across as many hosts as needed and possible. USW graph creation, damage, repair and preservation has been developed and tested in a simulation and reference implementation