7,301 research outputs found
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
Enhancing reuse of data and biological material in medical research : from FAIR to FAIR-Health
The known challenge of underutilization of data and biological material from biorepositories as potential resources
formedical research has been the focus of discussion for over a decade. Recently developed guidelines for improved
data availability and reusability—entitled FAIR Principles (Findability, Accessibility, Interoperability, and
Reusability)—are likely to address only parts of the problem. In this article,we argue that biologicalmaterial and data
should be viewed as a unified resource. This approach would facilitate access to complete provenance information,
which is a prerequisite for reproducibility and meaningful integration of the data. A unified view also allows for
optimization of long-term storage strategies, as demonstrated in the case of biobanks.Wepropose an extension of the
FAIR Principles to include the following additional components: (1) quality aspects related to research reproducibility
and meaningful reuse of the data, (2) incentives to stimulate effective enrichment of data sets and biological
material collections and its reuse on all levels, and (3) privacy-respecting approaches for working with the human
material and data. These FAIR-Health principles should then be applied to both the biological material and data. We
also propose the development of common guidelines for cloud architectures, due to the unprecedented growth of
volume and breadth of medical data generation, as well as the associated need to process the data efficiently.peer-reviewe
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Identity in research infrastructure and scientific communication: Report from the 1st IRISC workshop, Helsinki Sep 12-13, 2011
Motivation for the IRISC workshop came from the observation that identity and digital identification are increasingly important factors in modern scientific research, especially with the now near-ubiquitous use of the Internet as a global medium for dissemination and debate of scientific knowledge and data, and as a platform for scientific collaborations and large-scale e-science activities.

The 1 1/2 day IRISC2011 workshop sought to explore a series of interrelated topics under two main themes: i) unambiguously identifying authors/creators & attributing their scholarly works, and ii) individual identification and access management in the context of identity federations. Specific aims of the workshop included:

• Raising overall awareness of key technical and non-technical challenges, opportunities and developments.
• Facilitating a dialogue, cross-pollination of ideas, collaboration and coordination between diverse – and largely unconnected – communities.
• Identifying & discussing existing/emerging technologies, best practices and requirements for researcher identification.

This report provides background information on key identification-related concepts & projects, describes workshop proceedings and summarizes key workshop findings
Architecture for Provenance Systems
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies
A Semantic Hierarchy for Erasure Policies
We consider the problem of logical data erasure, contrasting with physical
erasure in the same way that end-to-end information flow control contrasts with
access control. We present a semantic hierarchy for erasure policies, using a
possibilistic knowledge-based semantics to define policy satisfaction such that
there is an intuitively clear upper bound on what information an erasure policy
permits to be retained. Our hierarchy allows a rich class of erasure policies
to be expressed, taking account of the power of the attacker, how much
information may be retained, and under what conditions it may be retained.
While our main aim is to specify erasure policies, the semantic framework
allows quite general information-flow policies to be formulated for a variety
of semantic notions of secrecy.Comment: 18 pages, ICISS 201
- …