2,212 research outputs found

    Report on the XBase Project

    Get PDF
    This project addressed the conceptual fundamentals of data storage, investigating techniques for provision of highly generic storage facilities that can be tailored to produce various individually customised storage infrastructures, compliant to the needs of particular applications. This requires the separation of mechanism and policy wherever possible. Aspirations include: actors, whether users or individual processes, should be able to bind to, update and manipulate data and programs transparently with respect to their respective locations; programs should be expressed independently of the storage and network technology involved in their execution; storage facilities should be structure-neutral so that actors can impose multiple interpretations over information, simultaneously and safely; information should not be discarded so that arbitrary historical views are supported; raw stored information should be open to all; where security restrictions on its use are required this should be achieved using cryptographic techniques. The key advances of the research were: 1) the identification of a candidate set of minimal storage system building blocks, which are sufficiently simple to avoid encapsulating policy where it cannot be customised by applications, and composable to build highly flexible storage architectures 2) insight into the nature of append-only storage components, and the issues arising from their application to common storage use-cases

    Efficient Management of Short-Lived Data

    Full text link
    Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl

    metajelo: A Metadata Package for Journals to Support External Linked Objects

    Get PDF
    We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

    Database evolution on an orthogonal persistent programming system: A semi-transparent approach

    Get PDF
    In this paper the problem of the evolution of an object-oriented database in the context of orthogonal persistent programming systems is addressed. We have observed two characteristics in that type of systems that offer particular conditions to implement the evolution in a semi-transparent fashion. That transparency can further be enhanced with the obliviousness provided by the Aspect-Oriented Programming techniques. Was conceived a meta-model and developed a prototype to test the feasibility of our approach. The system allows programs, written to a schema, access semi-transparently to data in other versions of the schema
    • …
    corecore