Search CORE

18 research outputs found

Language-integrated provenance in Haskell

Author: Cheney James
Stolarek Jan
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 27/03/2018
Field of study

Scientific progress increasingly depends on data management, particularly to clean and curate data so that it can be systematically analyzed and reused. A wealth of techniques for managing and curating data (and its provenance) have been proposed, largely in the database community. In particular, a number of influential papers have proposed collecting provenance information explaining where a piece of data was copied from, or what other records were used to derive it. Most of these techniques, however, exist only as research prototypes and are not available in mainstream database systems. This means scientists must either implement such techniques themselves or (all too often) go without. This is essentially a code reuse problem: provenance techniques currently cannot be implemented reusably, only as ad hoc, usually unmaintained extensions to standard databases. An alternative, relatively unexplored approach is to support such techniques at a higher abstraction level, using metaprogramming or reflection techniques. Can advanced programming techniques make it easier to transfer provenance research results into practice? We build on a recent approach called language-integrated provenance, which extends language-integrated query techniques with source-to-source query translations that record provenance. In previous work, a proof of concept was developed in a research programming language called Links, which supports sophisticated Web and database programming. In this paper, we show how to adapt this approach to work in Haskell building on top of the Database-Supported Haskell (DSH) library. Even though it seemed clear in principle that Haskell's rich programming features ought to be sufficient, implementing language-integrated provenance in Haskell required overcoming a number of technical challenges due to interactions between these capabilities. Our implementation serves as a proof of concept showing how this combination of metaprogramming features can, for the first time, make data provenance facilities available to programmers as a library in a widely-used, general-purpose language. In our work we were successful in implementing forms of provenance known as where-provenance and lineage. We have tested our implementation using a simple database and query set and established that the resulting queries are executed correctly on the database. Our implementation is publicly available on GitHub. Our work makes provenance tracking available to users of DSH at little cost. Although Haskell is not widely used for scientific database development, our work suggests which languages features are necessary to support provenance as library. We also highlight how combining Haskell's advanced type programming features can lead to unexpected complications, which may motivate further research into type system expressiveness

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Language-integrated provenance by trace analysis

Author: Fehrenbach Stefan
Cheney James
Publication venue
Publication date: 01/01/2006
Field of study

Language-integrated provenance builds on language-integrated query techniques to make provenance information explaining query results readily available to programmers. In previous work we have explored language-integrated approaches to provenance in Links and Haskell. However, implementing a new form of provenance in a language-integrated way is still a major challenge. We propose a self-tracing transformation and trace analysis features that, together with existing techniques for type-directed generic programming, make it possible to define different forms of provenance as user code. We present our design as an extension to a core language for Links called LinksT, give examples showing its capabilities, and outline its metatheory and key correctness properties.Comment: DBPL 201

arXiv.org e-Print Archive

Edinburgh Research Explorer

University of Richmond

Language-integrated provenance

Author: Amsterdamer
Benjelloun
Bhagwat
Buneman
Buneman
Buneman
Cheney
Cheney
Cheney
Cheney
Cheney
Chlipala
Cooper
Cooper
Corcoran
Cui
Fehrenbach
Felleisen
Giorgidze
Giorgidze
Glavic
Glavic
Glavic
Glavic
Green
Grust
Grust
James Cheney
Karvounarakis
Lindley
Meijer
Ohori
Perera
Pialorsi
Serrano
Shar
Stefan Fehrenbach
Suzuki
Swamy
Syme
Ulrich
Wang
Wong
Publication venue: 'Elsevier BV'
Publication date: 07/09/2016
Field of study

Provenance, or information about the origin or derivation of data, is important for assessing the trustworthiness of data and identifying and correcting mistakes. Most prior implementations of data provenance have involved heavyweight modifications to database systems and little attention has been paid to how the provenance data can be used outside such a system. We present extensions to the Links programming language that build on its support for language-integrated query to support provenance queries by rewriting and normalizing monadic comprehensions and extending the type system to distinguish provenance metadata from normal data. The main contribution of this article is to show that the two most common forms of provenance can be implemented efficiently and used safely as a programming language feature with no changes to the database system.Comment: Accepted to Science of Computer Programming special issue on PPDP 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Language-integrated provenance

Author: Fehrenbach Stefan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2019
Field of study

Provenance is metadata about the where, the why, and the how of data. It is evidence which can answer questions such as: Where exactly did this piece of data come from? Why is this row in my result? How was it produced? Answers to these questions are useful for judging the trustworthiness of data, and for finding and correcting mistakes. Most programs that use a database at all, already use one crude form of provenance: they manually propagate row identifiers together with database values, just in case they need to be updated later. More sophisticated forms of provenance are exceedingly rare, because they are more difficult to implement manually. Tools to calculate data provenance systematically, only exist as research prototypes. Even standard database systems are hard to set up, as evidenced by the rise of hosted database services, so there is little suprise that prototypes of provenance systems are not used much. This dissertation shows how a programming language can provide support for provenance. Based on language-integrated query technology, it can systematically rewrite queries to produce various forms of provenance. We describe such query transformations for where-provenance and lineage, and discuss how to enable programmers to define their own forms of provenance. Thanks to query normalization the resulting queries still execute efficiently on mainstream database systems. A programming language can help further by giving provenance metadata precise types to ensure that it is handled appropriately. Language-integrated queries make it easy to write programs that deal with data, no special query language needed. Language-integrated provenance makes it as easy to deal with data provenance, no special database needed

Edinburgh Research Archive

Language-integrated provenance in Links

Author: Cheney James
Fehrenbach Stefan
Publication venue
Publication date: 10/07/2015
Field of study

Today’s programming languages provide no support for data prove-nance. In a world that increasingly relies on data, we need prove-nance to judge the reliability of data and therefore should aim for making it easily accessible to programmers. We report our work in progress on an extension to the Links programming language that builds on its support for language-integrated query to support where-provenance queries through query rewriting and a type system extension that distinguishes provenance metadata from other data. Our approach aims to work solely within the language implementa-tion and thus require no changes to the database system. The type system together with automatic propagation of provenance metadata will prevent programmers from accidentally changing provenance, losing it, or misattributing it to other data. 1

CiteSeerX

Edinburgh Research Explorer

Language-integrated provenance by trace analysis

Author: Cheney James
Fehrenbach Stefan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Query Lifting: Language-integrated query for heterogeneous nested collections

Author: Cheney James
Ricciotti Wilmer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/01/2021
Field of study

Language-integrated query based on comprehension syntax is a powerful technique for safe database programming, and provides a basis for advanced techniques such as query shredding or query flattening that allow efficient programming with complex nested collections. However, the foundations of these techniques are lacking: although SQL, the most widely-used database query language, supports heterogeneous queries that mix set and multiset semantics, these important capabilities are not supported by known correctness results or implementations that assume homogeneous collections. In this paper we study language-integrated query for a heterogeneous query language

NRC_\lambda(Set,Bag)

that combines set and multiset constructs. We show how to normalize and translate queries to SQL, and develop a novel approach to querying heterogeneous nested collections, based on the insight that ``local'' query subexpressions that calculate nested subcollections can be ``lifted'' to the top level analogously to lambda-lifting for local function definitions.Comment: Full version of ESOP 2021 conference pape

arXiv.org e-Print Archive

Edinburgh Research Explorer

Cross-tier web programming for curated databases: a case study

Author: Cheney James
Fowler Simon
Harding Simon
Sharman Joanna
Publication venue: 'Edinburgh University Library'
Publication date: 30/07/2020
Field of study

Curated databases have become important sources of information across several scientific disciplines, and as the result of manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for the curated databases, but implementing such features is challenging, and small database projects often lack the resources to do so. A scientific database application is not just the relational database itself, but also an ecosystem of web applications to display the data, and applications which allow data curation. Supporting advanced curation features requires changing all of these components, and there is currently no way to provide such capabilities in a reusable way. Cross-tier programming languages have been proposed to simplify the creation of web applications, where developers can write an application in a single, uniform language. Consequently, database queries and updates can be written in the same language as the rest of the program, and at least in principle, it should be possible to provide curation features reusably via program transformations. As a first step towards this goal, it is important to establish that realistic curated databases can be implemented in a cross-tier programming language. In this paper, we describe such a case study: reimplementing the web front end of a real world scientific database, the IUPHAR/BPS Guide to Pharmacology (GtoPdb), in the Links cross-tier programming language. We show how programming language features such as language-integrated query simplify the development process, and rule out common errors. Through a comparative performance evaluation, we show that the Links implementation performs fewer database queries, while the time needed to handle the queries is comparable to the Java version. Furthermore, while there is some overhead to using Links because of its comparative immaturity compared to Java, the Links version is usable as a proof-of-concept case study of cross-tier programming for curated databases. [ This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review. The most up-to-date version of the paper can be found on arXiv https://arxiv.org/abs/2003.03845

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Enlighten

International Journal of Digital Curation

The Structure of the Literary Problem in the Formation of the Local Text Substrate

Author: Bakhtikireeva Uldanai M.
Bayanbaeva Zhadyra A.
Dzholdasbekova Bayan U.
Sinyachkin Vladimir P.
Publication venue: 'Media Watch'
Publication date: 01/01/2020
Field of study

The article aims to study the structure of the literary problem in the formation of the local text substrate. The study uses the methodology of studying the language when it changes in time and space. The article explains the basics of the methodological support of the translation complex and the structure of its application in private studies of foreign cultures and communicants. The results of the study showed the possibility of interaction between the subjects of linguistic exchange and the dynamics of the translation and literary component. The novelty of the study is determined by the fact that the work defines methods that can be used not only by local researchers but also by foreign-speaking communicants. The research results can be used in practical activities to bridge the gap between understanding the local text in translation studies and its structuring in the local versions of individual authors

SSOAR - Social Science Open Access Repository

Mixing set and bag semantics

Author: Cheney James
Ricciotti Wilmer
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

The conservativity theorem for nested relational calculus implies that query expressions can freely use nesting and unnesting, yet as long as the query result type is a flat relation, these capabilities do not lead to an increase in expressiveness over flat relational queries. Moreover, Wong showed how such queries can be translated to SQL via a constructive rewriting algorithm. While this result holds for queries over either set or multiset semantics, to the best of our knowledge, the questions of conservativity and normalization have not been studied for queries that mix set and bag collections, or provide duplicate-elimination operations such as SQL's

\mathtt{SELECT}~\mathtt{DISTINCT}

. In this paper we formalize the problem, and present partial progress: specifically, we introduce a calculus with both set and multiset collection types, along with natural mappings from sets to bags and vice versa, present a set of valid rewrite rules for normalizing such queries, and give an inductive characterization of a set of queries whose normal forms can be translated to SQL. We also consider examples that do not appear straightforward to translate to SQL, illustrating that the relative expressiveness of flat and nested queries with mixed set and multiset semantics remains an open question.Comment: DBPL 2019 -- short pape

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer