45,043 research outputs found
Recommended from our members
A General-Purpose Provenance Library
Most provenance capture takes place inside particular tools - a workflow engine, a database, an operating system, or an application. However, most users have an existing toolset - a collection of different tools that work well for their needs and with which they are comfortable. Currently, such users have limited ability to collect provenance without disrupting their work and changing environments, which most users are hesitant to do. Even users who are willing to adopt new tools, may realize limited benefit from provenance in those tools if they do not integrate with their entire environment, which may include multiple languages and frameworks. We present the Core Provenance Library (CPL), a portable, multi-lingual library that application programmers can easily incorporate into a variety of tools to collect and integrate provenance. Although the manual instrumentation adds extra work for application programmers, we show that in most cases, the work is minimal, and the resulting system solves several problems that plague more constrained provenance collection systems.Engineering and Applied Science
Language-integrated provenance in Haskell
Scientific progress increasingly depends on data management, particularly to
clean and curate data so that it can be systematically analyzed and reused. A
wealth of techniques for managing and curating data (and its provenance) have
been proposed, largely in the database community. In particular, a number of
influential papers have proposed collecting provenance information explaining
where a piece of data was copied from, or what other records were used to
derive it. Most of these techniques, however, exist only as research prototypes
and are not available in mainstream database systems. This means scientists
must either implement such techniques themselves or (all too often) go without.
This is essentially a code reuse problem: provenance techniques currently
cannot be implemented reusably, only as ad hoc, usually unmaintained extensions
to standard databases. An alternative, relatively unexplored approach is to
support such techniques at a higher abstraction level, using metaprogramming or
reflection techniques. Can advanced programming techniques make it easier to
transfer provenance research results into practice?
We build on a recent approach called language-integrated provenance, which
extends language-integrated query techniques with source-to-source query
translations that record provenance. In previous work, a proof of concept was
developed in a research programming language called Links, which supports
sophisticated Web and database programming. In this paper, we show how to adapt
this approach to work in Haskell building on top of the Database-Supported
Haskell (DSH) library.
Even though it seemed clear in principle that Haskell's rich programming
features ought to be sufficient, implementing language-integrated provenance in
Haskell required overcoming a number of technical challenges due to
interactions between these capabilities. Our implementation serves as a proof
of concept showing how this combination of metaprogramming features can, for
the first time, make data provenance facilities available to programmers as a
library in a widely-used, general-purpose language.
In our work we were successful in implementing forms of provenance known as
where-provenance and lineage. We have tested our implementation using a simple
database and query set and established that the resulting queries are executed
correctly on the database. Our implementation is publicly available on GitHub.
Our work makes provenance tracking available to users of DSH at little cost.
Although Haskell is not widely used for scientific database development, our
work suggests which languages features are necessary to support provenance as
library. We also highlight how combining Haskell's advanced type programming
features can lead to unexpected complications, which may motivate further
research into type system expressiveness
An Architecture for Provenance Systems
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies
Users' trust in information resources in the Web environment: a status report
This study has three aims; to provide an overview of the ways in which trust is either assessed or asserted in relation to the use and provision of resources in the Web environment for research and learning; to assess what solutions might be worth further investigation and whether establishing ways to assert trust in academic information resources could assist the development of information literacy; to help increase understanding of how perceptions of trust influence the behaviour of information users
Assessing Descriptive Substance in Free-Text Collection-Level Metadata
Collection-level metadata has the potential to provide important information about the features and purpose of individual collections. This paper reports on a content analysis of collection records in an aggregation of cultural heritage collections. The findings show that the free-text Description field often provides more accurate and complete representation of subjects and object types than the specified fields. Properties such as importance, uniqueness, comprehensiveness, provenance, and creator are articulated, as well as other vital contextual information about the intentions of a collector and the value of a collection, as a whole, for scholarly users. The results demonstrate that the semantically rich free-text Description field is essential to understanding the context of collections in large aggregations and can serve as a source of data for enhancing and customizing controlled vocabulariesIMLS NLG Research and Demonstration grant LG-06-07-0020-07published or submitted for publicationis peer reviewe
Shingle 2.0: generalising self-consistent and automated domain discretisation for multi-scale geophysical models
The approaches taken to describe and develop spatial discretisations of the
domains required for geophysical simulation models are commonly ad hoc, model
or application specific and under-documented. This is particularly acute for
simulation models that are flexible in their use of multi-scale, anisotropic,
fully unstructured meshes where a relatively large number of heterogeneous
parameters are required to constrain their full description. As a consequence,
it can be difficult to reproduce simulations, ensure a provenance in model data
handling and initialisation, and a challenge to conduct model intercomparisons
rigorously. This paper takes a novel approach to spatial discretisation,
considering it much like a numerical simulation model problem of its own. It
introduces a generalised, extensible, self-documenting approach to carefully
describe, and necessarily fully, the constraints over the heterogeneous
parameter space that determine how a domain is spatially discretised. This
additionally provides a method to accurately record these constraints, using
high-level natural language based abstractions, that enables full accounts of
provenance, sharing and distribution. Together with this description, a
generalised consistent approach to unstructured mesh generation for geophysical
models is developed, that is automated, robust and repeatable, quick-to-draft,
rigorously verified and consistent to the source data throughout. This
interprets the description above to execute a self-consistent spatial
discretisation process, which is automatically validated to expected discrete
characteristics and metrics.Comment: 18 pages, 10 figures, 1 table. Submitted for publication and under
revie
nanopub-java: A Java Library for Nanopublications
The concept of nanopublications was first proposed about six years ago, but
it lacked openly available implementations. The library presented here is the
first one that has become an official implementation of the nanopublication
community. Its core features are stable, but it also contains unofficial and
experimental extensions: for publishing to a decentralized server network, for
defining sets of nanopublications with indexes, for informal assertions, and
for digitally signing nanopublications. Most of the features of the library can
also be accessed via an online validator interface.Comment: Proceedings of 5th Workshop on Linked Science 201
Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries
Digital libraries, whether commercial, public or personal, lie at the heart of the information society. Yet, research into their longāterm viability and the meaningful accessibility of their contents remains in its infancy. In general, as we have pointed out elsewhere, āafter more
than twenty years of research in digital curation and preservation the actual theories, methods and technologies that can either foster or ensure digital longevity remain
startlingly limited.ā Research led by DigitalPreservationEurope (DPE) and the Digital
Preservation Cluster of DELOS has allowed us to refine the key research challenges ā theoretical, methodological and technological ā that need attention by researchers in digital libraries during the coming five to ten years, if we are to ensure that the materials held in our emerging digital libraries are to remain sustainable, authentic, accessible and understandable over time. Building on this work and taking the theoretical framework of archival science as bedrock, this paper investigates digital preservation and its foundational role if digital libraries are to have longāterm viability at the centre of the
global information society.
- ā¦