1,522 research outputs found
Worst-Case Optimal Algorithms for Parallel Query Processing
In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worst-case analysis of the communication cost. The goal is to find worst-case
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worst-case
algorithm. The optimal load for a conjunctive query when all relations have
size equal to is , where is a new query-related
quantity called the edge quasi-packing number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries
Efficient Query Processing for Dynamically Changing Datasets
The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. Traditional approaches to this problem were developed around the notion of Incremental View Maintenance (IVM), and are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques are suboptimal: instead of materializing results and subresults, one may also maintain a data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In two previous articles, we have presented algorithms for dynamically evaluating queries that are easy to implement, efficient, and can be naturally extended to evaluate queries from a wide range of application domains. In this paper, we discuss our algorithm and its complexity, explaining the main components behind its efficiency. Finally, we show experiments that compare our algorithm to a state-of-the-art (Higher-order) IVM engine, as well as to a prominent complex event recognition engine. Our approach outperforms the competitor systems by up to two orders of magnitude in processing time, and one order in memory consumption
Compressed Representations of Conjunctive Query Results
Relational queries, and in particular join queries, often generate large
output results when executed over a huge dataset. In such cases, it is often
infeasible to store the whole materialized output if we plan to reuse it
further down a data processing pipeline. Motivated by this problem, we study
the construction of space-efficient compressed representations of the output of
conjunctive queries, with the goal of supporting the efficient access of the
intermediate compressed result for a given access pattern. In particular, we
initiate the study of an important tradeoff: minimizing the space necessary to
store the compressed result, versus minimizing the answer time and delay for an
access request over the result. Our main contribution is a novel parameterized
data structure, which can be tuned to trade off space for answer time. The
tradeoff allows us to control the space requirement of the data structure
precisely, and depends both on the structure of the query and the access
pattern. We show how we can use the data structure in conjunction with query
decomposition techniques, in order to efficiently represent the outputs for
several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom
Algebraic optimization of recursive queries
Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud
\ud
In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud
\ud
The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems
- …