Search CORE

125 research outputs found

Database Queries that Explain their Work

Author: Acar Umut A.
Ahmed Amal
Cheney James
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to recompute the results on different inputs, possibly specialized to a part of the output. Many provenance systems claim to provide such capabilities; however, most lack formal definitions or guarantees of these properties, while others provide formal guarantees only for relatively limited classes of changes. Building on recent work on provenance traces and slicing for functional programming languages, we introduce a detailed tracing model of provenance for multiset-valued Nested Relational Calculus, define trace slicing algorithms that extract subtraces needed to explain or recompute specific parts of the output, and define query slicing and differencing techniques that support explanation. We state and prove correctness properties for these techniques and present a proof-of-concept implementation in Haskell.Comment: PPDP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

The lifecycle of provenance metadata and its associated challenges and opportunities

Author: Missier Paolo
Publication venue
Publication date: 01/01/2016
Field of study

This chapter outlines some of the challenges and opportunities associated with adopting provenance principles and standards in a variety of disciplines, including data publication and reuse, and information sciences

arXiv.org e-Print Archive

University of Birmingham Research Portal

Distilling Structure in Scientific Workflows

Author: Chen Jiuqiang
Cohen-Boulakia Sarah
Froidevaux Christine
Goble Carole
Williams Alan
Publication venue: EMBnet.journal
Publication date: 01/01/2012
Field of study

International audienceIn this work, we have conducted a series of experiments to better understand the structure of scientific workflows. In particular, we have investigated techniques to understand why scientific workflows may or may not have a series-parallel structure

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

The University of Manchester - Institutional Repository

HAL-Polytechnique

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Author: Bhardwaj Anant
Bhattacherjee Souvik
Chavan Amit
Deshpande Amol
Elmore Aaron J.
Madden Samuel
Parameswaran Aditya G.
Publication venue
Publication date: 02/09/2014
Field of study

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Four level provenance support to achieve portable reproducibility of scientific workflows

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Crossref

SZTAKI Publication Repository

Minimal sufficient information about the scientific workflows to create reproducible experiment

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

SZTAKI Publication Repository

Classification of Scientific Workflows Based on Reproducibility Analysis

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Crossref

SZTAKI Publication Repository

Reproducibility Analysis of Scientific Workflows

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Obuda University'
Publication date: 01/01/2017
Field of study

SZTAKI Publication Repository