2,212 research outputs found
Report on the XBase Project
This project addressed the conceptual fundamentals of data storage,
investigating techniques for provision of highly generic storage facilities
that can be tailored to produce various individually customised storage
infrastructures, compliant to the needs of particular applications. This
requires the separation of mechanism and policy wherever possible. Aspirations
include: actors, whether users or individual processes, should be able to bind
to, update and manipulate data and programs transparently with respect to their
respective locations; programs should be expressed independently of the storage
and network technology involved in their execution; storage facilities should
be structure-neutral so that actors can impose multiple interpretations over
information, simultaneously and safely; information should not be discarded so
that arbitrary historical views are supported; raw stored information should be
open to all; where security restrictions on its use are required this should be
achieved using cryptographic techniques. The key advances of the research were:
1) the identification of a candidate set of minimal storage system building
blocks, which are sufficiently simple to avoid encapsulating policy where it
cannot be customised by applications, and composable to build highly flexible
storage architectures 2) insight into the nature of append-only storage
components, and the issues arising from their application to common storage
use-cases
Efficient Management of Short-Lived Data
Motivated by the increasing prominence of loosely-coupled systems, such as
mobile and sensor networks, which are characterised by intermittent
connectivity and volatile data, we study the tagging of data with so-called
expiration times. More specifically, when data are inserted into a database,
they may be tagged with time values indicating when they expire, i.e., when
they are regarded as stale or invalid and thus are no longer considered part of
the database. In a number of applications, expiration times are known and can
be assigned at insertion time. We present data structures and algorithms for
online management of data tagged with expiration times. The algorithms are
based on fully functional, persistent treaps, which are a combination of binary
search trees with respect to a primary attribute and heaps with respect to a
secondary attribute. The primary attribute implements primary keys, and the
secondary attribute stores expiration times in a minimum heap, thus keeping a
priority queue of tuples to expire. A detailed and comprehensive experimental
study demonstrates the well-behavedness and scalability of the approach as well
as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl
metajelo: A Metadata Package for Journals to Support External Linked Objects
We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Database evolution on an orthogonal persistent programming system: A semi-transparent approach
In this paper the problem of the evolution of an object-oriented database in the context of orthogonal persistent programming systems is addressed. We have observed two characteristics in that type of systems that offer particular conditions to implement the evolution in a semi-transparent fashion. That transparency can further be enhanced with the obliviousness provided by the Aspect-Oriented Programming techniques. Was conceived a meta-model and developed a prototype to test the feasibility of our approach. The system allows programs, written to a schema, access semi-transparently to data in other versions of the schema
- …