Search CORE

14,914 research outputs found

On the Limitations of Provenance for Queries With Difference

Author: Amsterdamer Yael
Deutch Daniel
Tannen Val
Publication venue
Publication date: 01/01/2011
Field of study

The annotation of the results of database transformations was shown to be very effective for various applications. Until recently, most works in this context focused on positive query languages. The provenance semirings is a particular approach that was proven effective for these languages, and it was shown that when propagating provenance with semirings, the expected equivalence axioms of the corresponding query languages are satisfied. There have been several attempts to extend the framework to account for relational algebra queries with difference. We show here that these suggestions fail to satisfy some expected equivalence axioms (that in particular hold for queries on "standard" set and bag databases). Interestingly, we show that this is not a pitfall of these particular attempts, but rather every such attempt is bound to fail in satisfying these axioms, for some semirings. Finally, we show particular semirings for which an extension for supporting difference is (im)possible.Comment: TAPP 201

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD

Author: Cepellotti Andrea
Gražulis Saulius
Marzari Nicola
Merkys Andrius
Mounet Nicolas
Pizzi Giovanni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/06/2017
Field of study

In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes

arXiv.org e-Print Archive

Directory of Open Access Journals

Provenance Threat Modeling

Author: Brooks Richard R.
Hambolu Oluwakemi
Mukhopadhyay Ujan
Oakley Jon
Skjellum Anthony
Yu Lu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/03/2017
Field of study

Provenance systems are used to capture history metadata, applications include ownership attribution and determining the quality of a particular data set. Provenance systems are also used for debugging, process improvement, understanding data proof of ownership, certification of validity, etc. The provenance of data includes information about the processes and source data that leads to the current representation. In this paper we study the security risks provenance systems might be exposed to and recommend security solutions to better protect the provenance information.Comment: 4 pages, 1 figure, conferenc

arXiv.org e-Print Archive

Crossref

An Architecture for Provenance Systems

Author: Groth Paul
Jiang Sheng
Miles Simon
Moreau Luc
Munroe Steve
Tan Victor
Tsasakou Sofia
Publication venue: s.n.
Publication date: 01/02/2006
Field of study

This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies

Southampton (e-Prints Soton)

King's Research Portal

Architecture for Provenance Systems

Author: Groth Paul
Miles Simon
Moreau Luc
Tan Victor
Publication venue: s.n.
Publication date: 01/10/2005
Field of study

Southampton (e-Prints Soton)

Approximation with Error Bounds in Spark

Author: Hu Guangyan
Nguyen Thu D.
Rigo Sandro
Zhang Desheng
Publication venue
Publication date: 04/12/2018
Field of study

We introduce a sampling framework to support approximate computing with estimated error bounds in Spark. Our framework allows sampling to be performed at the beginning of a sequence of multiple transformations ending in an aggregation operation. The framework constructs a data provenance tree as the computation proceeds, then combines the tree with multi-stage sampling and population estimation theories to compute error bounds for the aggregation. When information about output keys are available early, the framework can also use adaptive stratified reservoir sampling to avoid (or reduce) key losses in the final output and to achieve more consistent error bounds across popular and rare keys. Finally, the framework includes an algorithm to dynamically choose sampling rates to meet user specified constraints on the CDF of error bounds in the outputs. We have implemented a prototype of our framework called ApproxSpark, and used it to implement five approximate applications from different domains. Evaluation results show that ApproxSpark can (a) significantly reduce execution time if users can tolerate small amounts of uncertainties and, in many cases, loss of rare keys, and (b) automatically find sampling rates to meet user specified constraints on error bounds. We also explore and discuss extensively trade-offs between sampling rates, execution time, accuracy and key loss

arXiv.org e-Print Archive

Crossref

Scipedia

Virtual Data in CMS Analysis

Author: Arbree A.
Avery P.
Bourilkov D.
Cavanaugh R.
Graham G.
Rodriguez J.
Wilde M.
Zhao Y.
Publication venue
Publication date: 01/01/2003
Field of study

The use of virtual data for enhancing the collaboration between large groups of scientists is explored in several ways: - by defining ``virtual'' parameter spaces which can be searched and shared in an organized way by a collaboration of scientists in the course of their analysis; - by providing a mechanism to log the provenance of results and the ability to trace them back to the various stages in the analysis of real or simulated data; - by creating ``check points'' in the course of an analysis to permit collaborators to explore their own analysis branches by refining selections, improving the signal to background ratio, varying the estimation of parameters, etc.; - by facilitating the audit of an analysis and the reproduction of its results by a different group, or in a peer review context. We describe a prototype for the analysis of data from the CMS experiment based on the virtual data system Chimera and the object-oriented data analysis framework ROOT. The Chimera system is used to chain together several steps in the analysis process including the Monte Carlo generation of data, the simulation of detector response, the reconstruction of physics objects and their subsequent analysis, histogramming and visualization using the ROOT framework.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 9 pages, LaTeX, 7 eps figures. PSN TUAT010. V2 - references adde

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server