5,370 research outputs found

    A Framework for Web Object Self-Preservation

    Get PDF
    We propose and develop a framework based on emergent behavior principles for the long-term preservation of digital data using the web infrastructure. We present the development of the framework called unsupervised small-world (USW) which is at the nexus of emergent behavior, graph theory, and digital preservation. The USW algorithm creates graph based structures on the Web used for preservation of web objects (WOs). Emergent behavior activities, based on Craig Reynolds’ “boids” concept, are used to preserve WOs without the need for a central archiving authority. Graph theory is extended by developing an algorithm that incrementally creates small-world graphs. Graph theory provides a foundation to discuss the vulnerability of graphs to different types of failures and attack profiles. Investigation into the robustness and resilience of USW graphs lead to the development of a metric to quantify the effect of damage inflicted on a graph. The metric remains valid whether the graph is connected or not. Different USW preservation policies are explored within a simulation environment where preservation copies have to be spread across hosts. Spreading the copies across hosts helps to ensure that copies will remain available even when there is a concerted effort to remove all copies of a USW component. A moderately aggressive preservation policy is the most effective at making the best use of host and network resources. Our efforts are directed at answering the following research questions: 1. Can web objects (WOs) be constructed to outlive the people and institutions that created them? We have developed, analyzed, tested through simulations, and developed a reference implementation of the unsupervised small-world (USW) algorithm that we believe will create a connected network of WOs based on the web infrastructure (WI) that will outlive the people and institutions that created the WOs. The USW graph will outlive its creators by being robust and continuing to operate when some of its WOs are lost, and it is resilient and will recover when some of its WOs are lost. 2. Can we leverage aspects of naturally occurring networks and group behavior for preservation? We used Reynolds’ tenets for “boids” to guide our analysis and development of the USW algorithm. The USW algorithm allows a WO to “explore” a portion of the USW graph before making connections to members of the graph and before making preservation copies across the “discovered” graph. Analysis and simulation show that the USW graph has an average path length (L(G)) and clustering coefficient (C(G)) values comparable to small-world graphs. A high C(G) is important because it reflects how likely it is that a WO will be able spread copies to other domains, thereby increasing its likelihood of long term survival. A short L(G) is important because it means that a WO will not have to look too far to identify new candidate preservation domains, if needed. Small-world graphs occur in nature and are thus believed to be robust and resilient. The USW algorithms use these small-world graph characteristics to spread preservation copies across as many hosts as needed and possible. USW graph creation, damage, repair and preservation has been developed and tested in a simulation and reference implementation

    Metadata enrichment for digital heritage: users as co-creators

    Get PDF
    This paper espouses the concept of metadata enrichment through an expert and user-focused approach to metadata creation and management. To this end, it is argued the Web 2.0 paradigm enables users to be proactive metadata creators. As Shirky (2008, p.47) argues Web 2.0’s social tools enable “action by loosely structured groups, operating without managerial direction and outside the profit motive”. Lagoze (2010, p. 37) advises, “the participatory nature of Web 2.0 should not be dismissed as just a popular phenomenon [or fad]”. Carletti (2016) proposes a participatory digital cultural heritage approach where Web 2.0 approaches such as crowdsourcing can be sued to enrich digital cultural objects. It is argued that “heritage crowdsourcing, community-centred projects or other forms of public participation”. On the other hand, the new collaborative approaches of Web 2.0 neither negate nor replace contemporary standards-based metadata approaches. Hence, this paper proposes a mixed metadata approach where user created metadata augments expert-created metadata and vice versa. The metadata creation process no longer remains to be the sole prerogative of the metadata expert. The Web 2.0 collaborative environment would now allow users to participate in both adding and re-using metadata. The case of expert-created (standards-based, top-down) and user-generated metadata (socially-constructed, bottom-up) approach to metadata are complementary rather than mutually-exclusive. The two approaches are often mistakenly considered as dichotomies, albeit incorrectly (Gruber, 2007; Wright, 2007) . This paper espouses the importance of enriching digital information objects with descriptions pertaining the about-ness of information objects. Such richness and diversity of description, it is argued, could chiefly be achieved by involving users in the metadata creation process. This paper presents the importance of the paradigm of metadata enriching and metadata filtering for the cultural heritage domain. Metadata enriching states that a priori metadata that is instantiated and granularly structured by metadata experts is continually enriched through socially-constructed (post-hoc) metadata, whereby users are pro-actively engaged in co-creating metadata. The principle also states that metadata that is enriched is also contextually and semantically linked and openly accessible. In addition, metadata filtering states that metadata resulting from implementing the principle of enriching should be displayed for users in line with their needs and convenience. In both enriching and filtering, users should be considered as prosumers, resulting in what is called collective metadata intelligence

    HTTP Mailbox - Asynchronous RESTful Communication

    Full text link
    We describe HTTP Mailbox, a mechanism to enable RESTful HTTP communication in an asynchronous mode with a full range of HTTP methods otherwise unavailable to standard clients and servers. HTTP Mailbox allows for broadcast and multicast semantics via HTTP. We evaluate a reference implementation using ApacheBench (a server stress testing tool) demonstrating high throughput (on 1,000 concurrent requests) and a systemic error rate of 0.01%. Finally, we demonstrate our HTTP Mailbox implementation in a human assisted web preservation application called "Preserve Me".Comment: 13 pages, 6 figures, 8 code blocks, 3 equations, and 3 table

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Fast 2D/3D object representation with growing neural gas

    Get PDF
    This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction

    Grids and the Virtual Observatory

    Get PDF
    We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)
    • 

    corecore