Search CORE

94 research outputs found

Solving equations in the relational algebra

Author: Biskup Joachim
Bussche Jan Van den
Paredaens Jan
Schwentick Thomas
Publication venue
Publication date: 10/12/2003
Field of study

Enumerating all solutions of a relational algebra equation is a natural and powerful operation which, when added as a query language primitive to the nested relational algebra, yields a query language for nested relational databases, equivalent to the well-known powerset algebra. We study \emph{sparse} equations, which are equations with at most polynomially many solutions. We look at their complexity, and compare their expressive power with that of similar notions in the powerset algebra.Comment: Minor revision, accepted for publication in SIAM Journal on Computin

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Incremental View Maintenance For Collection Programming

Author: Ceri Stefano
den Bussche Jan Van
Dimitrova Katica
Foster J. Nathan
Gupta Ashish
Johnson David S.
Kazem Lellahi S.
Liu Jixue
Suciu Dan
Zaharia Matei
Zeume Thomas
Publication venue
Publication date: 11/04/2016
Field of study

In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Query Lifting: Language-integrated query for heterogeneous nested collections

Author: Cheney James
Ricciotti Wilmer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/01/2021
Field of study

Language-integrated query based on comprehension syntax is a powerful technique for safe database programming, and provides a basis for advanced techniques such as query shredding or query flattening that allow efficient programming with complex nested collections. However, the foundations of these techniques are lacking: although SQL, the most widely-used database query language, supports heterogeneous queries that mix set and multiset semantics, these important capabilities are not supported by known correctness results or implementations that assume homogeneous collections. In this paper we study language-integrated query for a heterogeneous query language

NRC_\lambda(Set,Bag)

that combines set and multiset constructs. We show how to normalize and translate queries to SQL, and develop a novel approach to querying heterogeneous nested collections, based on the insight that ``local'' query subexpressions that calculate nested subcollections can be ``lifted'' to the top level analogously to lambda-lifting for local function definitions.Comment: Full version of ESOP 2021 conference pape

arXiv.org e-Print Archive

Edinburgh Research Explorer

Generating collection transformations from proofs

Author: Afrati Foto
Benedikt Michael
den Bussche Jan Van
Fitting Melvin
Gibbons Jeremy
Hodges Wilfrid
Jacobs Bart
Srivastava Saurabh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/11/2020
Field of study

Nested relations, built up from atomic types via product and set types, form a rich data model. Over the last decades the nested relational calculus, NRC, has emerged as a standard language for defining transformations on nested collections. NRC is a strongly-typed functional language which allows building up transformations using tupling and projections, a singleton-former, and a map operation that lifts transformations on tuples to transformations on sets.In this work we describe an alternative declarative method of describing transformations in logic. A formula with distinguished inputs and outputs gives an implicit definition if one can prove that for each input there is only one output that satisfies it. Our main result shows that one can synthesize transformations from proofs that a formula provides an implicit definition, where the proof is in an intuitionistic calculus that captures a natural style of reasoning about nested collections. Our polynomial time synthesis procedure is based on an analog of Craig’s interpolation lemma, starting with a provable containment between terms representing nested collections and generating an NRC expression that interpolates between them.We further show that NRC expressions that implement an implicit definition can be found when there is a classical proof of functionality, not just when there is an intuitionistic one. That is, whenever a formula implicitly defines a transformation, there is an NRC expression that implements it

arXiv.org e-Print Archive

Crossref

Cronfa at Swansea University

Oxford University Research Archive

Expressivity and Complexity of MongoDB Queries

Author: Botoeva Elena
Calvanese Diego
Cogrel Benjamin
Xiao Guohui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

In this paper, we consider MongoDB, a widely adopted but not formally understood database system managing JSON documents and equipped with a powerful query mechanism, called the aggregation framework. We provide a clean formal abstraction of this query language, which we call MQuery. We study the expressivity of MQuery, showing the equivalence of its well-typed fragment with nested relational algebra. We further investigate the computational complexity of significant fragments of it, obtaining several (tight) bounds in combined complexity, which range from LogSpace to alternating exponential-time with a polynomial number of alternations

Dagstuhl Research Online Publication Server

Kent Academic Repository

Query Flattening and the Nested Data Parallelism Paradigm

Author: Ulrich Alexander
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

This work is based on the observation that languages for two seemingly distant domains are closely related. Orthogonal query languages based on comprehension syntax admit various forms of query nesting to construct nested query results and express complex predicates. Languages for nested data parallelism allow to nest parallel iterators and thereby admit the parallel evaluation of computations that are themselves parallel. Both kinds of languages center around the application of side-effect-free functions to each element of a collection. The motivation for this work is the seamless integration of relational database queries with programming languages. In frameworks for language-integrated database queries, a host language's native collection-programming API is used to express queries. To mediate between native collection programming and relational queries, we define an expressive, orthogonal query calculus that supports nesting and order. The challenge of query flattening is to translate this calculus to bundles of efficient relational queries restricted to flat, unordered multisets. Prior approaches to query flattening either support only query languages that lack in expressiveness or employ a complex, monolithic translation that is hard to comprehend and generates inefficient code that is hard to optimize. To improve on those approaches, we draw on the similarity to nested data parallelism. Blelloch's flattening transformation is a static program transformation that translates nested data parallelism to flat data parallel programs over flat arrays. Based on the flattening transformation, we describe a pipeline of small, comprehensible lowering steps that translates our nested query calculus to a bundle of relational queries. The pipeline is based on a number of well-defined intermediate languages. Our translation adopts the key concepts of the flattening transformation but is designed with specifics of relational query processing in mind. Based on this translation, we revisit all aspects of query flattening. Our translation is fully compositional and can translate any term of the input language. Like prior work, the translation by itself produces inefficient code due to compositionality that is not fit for execution without optimization. In contrast to prior work, we show that query optimization is orthogonal to flattening and can be performed before flattening. We employ well-known work on logical query optimization for nested query languages and demonstrate that this body of work integrates well with our approach. Furthermore, we describe an improved encoding of ordered and nested collections in terms of flat, unordered multisets. Our approach emits idiomatic relational queries in which the effort required to maintain the non-relational semantics of the source language (order and nesting) is minimized. A set of experiments provides evidence that our approach to query flattening can handle complex, list-based queries with nested results and nested intermediate data well. We apply our approach to a number of flat and nested benchmark queries and compare their runtime with hand-written SQL queries. In these experiments, our SQL code generated from a list-based nested query language usually performs as well as hand-written queries

Publikationsserver der Universität Tübingen

Algebraic Property Graphs

Author: Shinavier Joshua
Wisnesky Ryan
Publication venue
Publication date: 11/09/2019
Field of study

In this paper, we use algebraic data types to define a formal basis for the property graph data models supported by popular open source and commercial graph databases. Developed as a kind of inter-lingua for enterprise data integration, algebraic property graphs encode the binary edges and key-value pairs typical of property graphs, and also provide a well-defined notion of schema and support straightforward mappings to and from non-graph datasets, including relational, streaming, and microservice data commonly encountered in enterprise environments. We propose algebraic property graphs as a simple but mathematically rigorous bridge between graph and non-graph data models, broadening the scope of graph computing by removing obstacles to the construction of virtual graphs

arXiv.org e-Print Archive

Reducing End-User Burden in Everyday Data Organization.

Author: Li Qian
Manish Singh
Matthew Burgess
Michael Anderson
Rajesh Bejugam
Publication venue
Publication date: 01/01/2013
Field of study

As digital data permeates every aspect of our daily life, more and more end-users are organizing their everyday data electronically. In fact, end-users are already used to managing their personal data such as contact books and calendars in electronic devices. Meanwhile, the desire for organizing more information into the computer is expanding for a broader group of users. For example, a scientist may need to regularly manage a substantial amount of science data on his desktop. However, to organize such everyday data is challenging for these end-users, because they have limited knowledge about data schema, which is key to data management tasks such as database design, data transformation and data integration. While the user is struggling with these schema tasks, various cognitive and operational burdens emerge. First, when designing her data collection, the user has the burden to abstract her mental model of her real-life data into a reasonable schema design. Moreover, when incorporating external data sources, there is a burden to understand the source semantics and a burden to transform the data from those sources into the user's own data collection. Meanwhile, if the user wants to filter the data, she has the burden to understand and specify the selection condition. Finally, when existing sources are update, there is a burden to understand and fuse these updates. This dissertation introduces various approaches to help the end-user reduce these burdens. To ease the design pain, the dissertation proposes a system with a next-generation spreadsheet for the end-user to easily design and evolve her schema. To facilitate incorporation of external data sources, a sample-driven schema mapping approach is introduced so that the user can freely provide sample instances in her own collection and the system will automatically deduce the desired schema mapping from the sources to the collection. In a similar flavor, this dissertation proposes an approach to facilitate the user in specifying selection conditions via example data points she wants to select. Finally, to help the user incorporate source data updates into her data collection, the dissertation proposes a technique to incrementally update the integrated data using previous integration results.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99778/1/eql_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Query Shredding: Efficient Relational Evaluation of Queries over Nested Multisets

Author: Cooper E.
Grust T.
Grust T.
Ulrich A.
Wadler P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Nested relational query languages have been explored extensively, and underlie industrial language-integrated query systems such as Microsoft's LINQ. However, relational databases do not natively support nested collections in query results. This can lead to major performance problems: if programmers write queries that yield nested results, then such systems typically either fail or generate a large number of queries. We present a new approach to query shredding, which converts a query returning nested data to a fixed number of SQL queries. Our approach, in contrast to prior work, handles multiset semantics, and generates an idiomatic SQL:1999 query directly from a normal form for nested queries. We provide a detailed description of our translation and present experiments showing that it offers comparable or better performance than a recent alternative approach on a range of examples.Comment: Extended version of SIGMOD 2014 conference pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

Heriot Watt Pure

Edinburgh Research Explorer