Search CORE

7,301 research outputs found

Provenance for Aggregate Queries

Author: Amsterdamer Yael
Deutch Daniel
Tannen Val
Publication venue
Publication date: 01/01/2011
Field of study

We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for "simple" queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn

Enhancing reuse of data and biological material in medical research : from FAIR to FAIR-Health

Author: Casati Sara
Dagher Georges
Gabriele Anton
Holub Petr
Kohlmayer Florian
Koumakis Lefteris
Kozera Łukasz
Lavitrano Marialuisa
Litton Jan-Eric
Martin Gillian M.
Mendy Maimuna
Ommen GertJan B. van
Prasser Fabian
Schlunder Irene
Sezerman Osman Ugur
Strapagiel Dominik
Th. Mayrhofer Michaela
Valik Dalibor
Wutte Andrea
Zanetti Gianluigi
Zatloukal Kurt
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2018
Field of study

The known challenge of underutilization of data and biological material from biorepositories as potential resources formedical research has been the focus of discussion for over a decade. Recently developed guidelines for improved data availability and reusability—entitled FAIR Principles (Findability, Accessibility, Interoperability, and Reusability)—are likely to address only parts of the problem. In this article,we argue that biologicalmaterial and data should be viewed as a unified resource. This approach would facilitate access to complete provenance information, which is a prerequisite for reproducibility and meaningful integration of the data. A unified view also allows for optimization of long-term storage strategies, as demonstrated in the case of biobanks.Wepropose an extension of the FAIR Principles to include the following additional components: (1) quality aspects related to research reproducibility and meaningful reuse of the data, (2) incentives to stimulate effective enrichment of data sets and biological material collections and its reuse on all levels, and (3) privacy-respecting approaches for working with the human material and data. These FAIR-Health principles should then be applied to both the biological material and data. We also propose the development of common guidelines for cloud architectures, due to the unprecedented growth of volume and breadth of medical data generation, as well as the associated need to process the data efficiently.peer-reviewe

OAR@UM

Acibadem University Repository

Data mining and fusion

Author: Addis M. J.
Choi F.
Taylor S. J.
Upstill C.
Watkins E. R.
Publication venue: s.n.
Publication date: 01/04/2006
Field of study

Southampton (e-Prints Soton)

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

Identity in research infrastructure and scientific communication: Report from the 1st IRISC workshop, Helsinki Sep 12-13, 2011

Author: Anthony J. Brookes
Gudmundur A. Thorisson
Juha Muilu
Mikael Linden
Myles Byrne
Tommi Nyronen
Publication venue
Publication date: 16/11/2011
Field of study

Motivation for the IRISC workshop came from the observation that identity and digital identification are increasingly important factors in modern scientific research, especially with the now near-ubiquitous use of the Internet as a global medium for dissemination and debate of scientific knowledge and data, and as a platform for scientific collaborations and large-scale e-science activities.

The 1 1/2 day IRISC2011 workshop sought to explore a series of interrelated topics under two main themes: i) unambiguously identifying authors/creators & attributing their scholarly works, and ii) individual identification and access management in the context of identity federations. Specific aims of the workshop included:

• Raising overall awareness of key technical and non-technical challenges, opportunities and developments.
• Facilitating a dialogue, cross-pollination of ideas, collaboration and coordination between diverse – and largely unconnected – communities.
• Identifying & discussing existing/emerging technologies, best practices and requirements for researcher identification.

This report provides background information on key identification-related concepts & projects, describes workshop proceedings and summarizes key workshop findings

Nature Precedings

Architecture for Provenance Systems

Author: Groth Paul
Miles Simon
Moreau Luc
Tan Victor
Publication venue: s.n.
Publication date: 01/10/2005
Field of study

This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies

Southampton (e-Prints Soton)

A Semantic Hierarchy for Erasure Policies

Author: A. Askarov
A. Sabelfeld
A. Sabelfeld
A. Sabelfeld
F. Tedesco Del
I. Mastroeni
K.R. O’Neill
P. Cousot
R. Alur
R. Focardi
S. Hunt
S. Hunt
Publication venue
Publication date: 01/01/2011
Field of study

We consider the problem of logical data erasure, contrasting with physical erasure in the same way that end-to-end information flow control contrasts with access control. We present a semantic hierarchy for erasure policies, using a possibilistic knowledge-based semantics to define policy satisfaction such that there is an intuitively clear upper bound on what information an erasure policy permits to be retained. Our hierarchy allows a rich class of erasure policies to be expressed, taking account of the power of the attacker, how much information may be retained, and under what conditions it may be retained. While our main aim is to specify erasure policies, the semantic framework allows quite general information-flow policies to be formulated for a variety of semantic notions of secrecy.Comment: 18 pages, ICISS 201

arXiv.org e-Print Archive

City Research Online

Crossref

Chalmers Research

Chalmers Publication Library