28,660 research outputs found
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Multiple Query Optimization on the D-Wave 2X Adiabatic Quantum Computer
The D-Wave adiabatic quantum annealer solves hard combinatorial optimization
problems leveraging quantum physics. The newest version features over 1000
qubits and was released in August 2015. We were given access to such a machine,
currently hosted at NASA Ames Research Center in California, to explore the
potential for hard optimization problems that arise in the context of
databases.
In this paper, we tackle the problem of multiple query optimization (MQO). We
show how an MQO problem instance can be transformed into a mathematical formula
that complies with the restrictive input format accepted by the quantum
annealer. This formula is translated into weights on and between qubits such
that the configuration minimizing the input formula can be found via a process
called adiabatic quantum annealing. We analyze the asymptotic growth rate of
the number of required qubits in the MQO problem dimensions as the number of
qubits is currently the main factor restricting applicability. We
experimentally compare the performance of the quantum annealer against other
MQO algorithms executed on a traditional computer. While the problem sizes that
can be treated are currently limited, we already find a class of problem
instances where the quantum annealer is three orders of magnitude faster than
other approaches
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query
processing, optimization, and query language support to many popular
open-source data processing systems such as Apache Hive, Apache Storm, Apache
Flink, Druid, and MapD. Calcite's architecture consists of a modular and
extensible query optimizer with hundreds of built-in optimization rules, a
query processor capable of processing a variety of query languages, an adapter
architecture designed for extensibility, and support for heterogeneous data
models and stores (relational, semi-structured, streaming, and geospatial).
This flexible, embeddable, and extensible architecture is what makes Calcite an
attractive choice for adoption in big-data frameworks. It is an active project
that continues to introduce support for the new types of data sources, query
languages, and approaches to query processing and optimization.Comment: SIGMOD'1
Deductive Optimization of Relational Data Storage
Optimizing the physical data storage and retrieval of data are two key
database management problems. In this paper, we propose a language that can
express a wide range of physical database layouts, going well beyond the row-
and column-based methods that are widely used in database management systems.
We use deductive synthesis to turn a high-level relational representation of a
database query into a highly optimized low-level implementation which operates
on a specialized layout of the dataset. We build a compiler for this language
and conduct experiments using a popular database benchmark, which shows that
the performance of these specialized queries is competitive with a
state-of-the-art in memory compiled database system
- …