13,128 research outputs found
Query Flattening and the Nested Data Parallelism Paradigm
This work is based on the observation that languages for two seemingly distant
domains are closely related. Orthogonal query languages based on comprehension
syntax admit various forms of query nesting to construct nested query results
and express complex predicates. Languages for nested data parallelism allow to
nest parallel iterators and thereby admit the parallel evaluation of
computations that are themselves parallel. Both kinds of languages center around
the application of side-effect-free functions to each element of a collection.
The motivation for this work is the seamless integration of relational database
queries with programming languages. In frameworks for language-integrated
database queries, a host language's native collection-programming API is used
to express queries. To mediate between native collection programming and
relational queries, we define an expressive, orthogonal query calculus that
supports nesting and order. The challenge of query flattening is to translate
this calculus to bundles of efficient relational queries restricted to flat,
unordered multisets. Prior approaches to query flattening either support only
query languages that lack in expressiveness or employ a complex, monolithic
translation that is hard to comprehend and generates inefficient code that is
hard to optimize.
To improve on those approaches, we draw on the similarity to nested data
parallelism. Blelloch's flattening transformation is a static program
transformation that translates nested data parallelism to flat data parallel
programs over flat arrays. Based on the flattening transformation, we describe a
pipeline of small, comprehensible lowering steps that translates our nested
query calculus to a bundle of relational queries. The pipeline is based on a
number of well-defined intermediate languages. Our translation adopts the key
concepts of the flattening transformation but is designed with specifics of
relational query processing in mind.
Based on this translation, we revisit all aspects of query flattening. Our
translation is fully compositional and can translate any term of the input
language. Like prior work, the translation by itself produces inefficient code
due to compositionality that is not fit for execution without optimization. In
contrast to prior work, we show that query optimization is orthogonal to
flattening and can be performed before flattening. We employ well-known work on
logical query optimization for nested query languages and demonstrate that this
body of work integrates well with our approach.
Furthermore, we describe an improved encoding of ordered and nested collections
in terms of flat, unordered multisets. Our approach emits idiomatic relational
queries in which the effort required to maintain the non-relational semantics of
the source language (order and nesting) is minimized.
A set of experiments provides evidence that our approach to query flattening can
handle complex, list-based queries with nested results and nested intermediate
data well. We apply our approach to a number of flat and nested benchmark
queries and compare their runtime with hand-written SQL queries. In these
experiments, our SQL code generated from a list-based nested query language
usually performs as well as hand-written queries
Reify Your Collection Queries for Modularity and Speed!
Modularity and efficiency are often contradicting requirements, such that
programers have to trade one for the other. We analyze this dilemma in the
context of programs operating on collections. Performance-critical code using
collections need often to be hand-optimized, leading to non-modular, brittle,
and redundant code. In principle, this dilemma could be avoided by automatic
collection-specific optimizations, such as fusion of collection traversals,
usage of indexing, or reordering of filters. Unfortunately, it is not obvious
how to encode such optimizations in terms of ordinary collection APIs, because
the program operating on the collections is not reified and hence cannot be
analyzed.
We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala
collections API that allows such analyses and optimizations to be defined and
executed within Scala, without relying on external tools or compiler
extensions. SQuOpt provides the same "look and feel" (syntax and static typing
guarantees) as the standard collections API. We evaluate SQuOpt by
re-implementing several code analyses of the Findbugs tool using SQuOpt, show
average speedups of 12x with a maximum of 12800x and hence demonstrate that
SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page
A Data Transformation System for Biological Data Sources
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
- …