Search CORE

447 research outputs found

On the Limitations of Provenance for Queries With Difference

Author: Amsterdamer Yael
Deutch Daniel
Tannen Val
Publication venue
Publication date: 01/01/2011
Field of study

The annotation of the results of database transformations was shown to be very effective for various applications. Until recently, most works in this context focused on positive query languages. The provenance semirings is a particular approach that was proven effective for these languages, and it was shown that when propagating provenance with semirings, the expected equivalence axioms of the corresponding query languages are satisfied. There have been several attempts to extend the framework to account for relational algebra queries with difference. We show here that these suggestions fail to satisfy some expected equivalence axioms (that in particular hold for queries on "standard" set and bag databases). Interestingly, we show that this is not a pitfall of these particular attempts, but rather every such attempt is bound to fail in satisfying these axioms, for some semirings. Finally, we show particular semirings for which an extension for supporting difference is (im)possible.Comment: TAPP 201

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

A Brief Tour through Provenance in Scientific Workflows and Databases

Author: Bertram Ludäscher
Publication venue
Publication date: 03/03/2016
Field of study

Within computer science, the term provenance has multiple meanings, due to different motivations, perspectives, and assumptions prevalent in the respective communities. This chapter provides a high-level “sightseeing tour” of some of those different notions and uses of provenance in scientific workflows and databases.Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

Provenance for Aggregate Queries

Author: Amsterdamer Yael
Deutch Daniel
Tannen Val
Publication venue
Publication date: 01/01/2011
Field of study

We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for "simple" queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn

Language-integrated provenance by trace analysis

Author: Fehrenbach Stefan
Cheney James
Publication venue
Publication date: 01/01/2006
Field of study

Language-integrated provenance builds on language-integrated query techniques to make provenance information explaining query results readily available to programmers. In previous work we have explored language-integrated approaches to provenance in Links and Haskell. However, implementing a new form of provenance in a language-integrated way is still a major challenge. We propose a self-tracing transformation and trace analysis features that, together with existing techniques for type-directed generic programming, make it possible to define different forms of provenance as user code. We present our design as an extension to a core language for Links called LinksT, give examples showing its capabilities, and outline its metatheory and key correctness properties.Comment: DBPL 201

arXiv.org e-Print Archive

Edinburgh Research Explorer

University of Richmond

A Time-Series Compression Technique and its Application to the Smart Grid

Author: Böhm Klemens
Efros Pavel
Eichinger Frank
Karnouskos Stamatis
Publication venue: Springer
Publication date: 17/03/2015
Field of study

Time-series data is increasingly collected in many domains. One example is the smart electricity infrastructure, which generates huge volumes of such data from sources such as smart electricity meters. Although today this data is used for visualization and billing in mostly 15-min resolution, its original temporal resolution frequently is more fine-grained, e.g., seconds. This is useful for various analytical applications such as short-term forecasting, disaggregation and visualization. However, transmitting and storing huge amounts of such fine-grained data is prohibitively expensive in terms of storage space in many cases. In this article, we present a compression technique based on piecewise regression and two methods which describe the performance of the compression. Although our technique is a general approach for time-series compression, smart grids serve as our running example and as our evaluation scenario. Depending on the data and the use-case scenario, the technique compresses data by ratios of up to factor 5,000 while maintaining its usefulness for analytics. The proposed technique has outperformed related work and has been applied to three real-world energy datasets in different scenarios. Finally, we show that the proposed compression technique can be implemented in a state-of-the-art database management system

KITopen

A unified framework for managing provenance information in translational research

Author: A Ayers
A Borgida
A Gangemi
AH Asiaee
Amit P Sheth
AP Chapman
B Smith
B Weatherly
C Aurrecoechea
CF Taylor
D Brickley
D Oberle
DL McGuinness
DL Wheeler
DLWD Martin
E Prud'ommeaux
E Sirin
G Klyne
HSU Parkinson
I Niles
J Pérez
J Widom
J Zhao
JR Hobbs
KKSM Muniswamy-Reddy
KLSE Eilbeck
L Chiticariu
M Ashburner
M Kanehisa
M Vardi
O Bodenreider
O Bodenreider
O Bodenreider
Olivier Bodenreider
P Buneman
P Hayes
P Hitzler
Priti Parikh
R Angles
RSK Mehra
Satya S Sahoo
SS Sahoo
SS Sahoo
SS Sahoo
SS Sahoo
SS Sahoo
T Lee
TJ Green
Todd Minning
V Cross
Vinh Nguyen
Y Cui
YL Simmhan
YR Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A critical aspect of the NIH <it>Translational Research </it>roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the <it>provenance </it>metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the <it>pre-publication </it>and <it>post-publication </it>phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for <it>Trypanosoma cruzi </it>(<it>T.cruzi </it>SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CORE

Classification of annotation semirings over query containment

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Crossref

Recommended from our members

Enhancing Usability and Explainability of Data Systems

Author: Fariha Anna
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/10/2021
Field of study

The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems. For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction

ScholarWorks@UMass Amherst

Proceedings of the Third International Workshop on Management of Uncertain Data (MUD2009)

Author
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 28/08/2009
Field of study

University of Twente Research Information