8 research outputs found
Main Memory Implementations for Binary Grouping
An increasing number of applications depend on efficient storage and analysis features for XML data. Hence, query optimization and efficient evaluation techniques for the emerging XQuery standard become more and more important. Many XQuery queries require nested expressions. Unnesting them often introduces binary grouping. We introduce several algorithms implementing binary grouping and analyze their time and space complexity. Experiments demonstrate their performance
Sort-based grouping and aggregation
Database query processing requires algorithms for duplicate removal,
grouping, and aggregation. Three algorithms exist: in-stream aggregation is
most efficient by far but requires sorted input; sort-based aggregation relies
on external merge sort; and hash aggregation relies on an in-memory hash table
plus hash partitioning to temporary storage. Cost-based query optimization
chooses which algorithm to use based on several factors including input and
output sizes, the sort order of the input, and the need for sorted output. For
example, hash-based aggregation is ideal for small output (e.g., TPC-H Query
1), whereas sorting the entire input and aggregating after sorting are
preferable when both aggregation input and output are large and the output
needs to be sorted for a subsequent operation such as a merge join.
Unfortunately, the size information required for a sound choice is often
inaccurate or unavailable during query optimization, leading to sub-optimal
algorithm choices. To address this challenge, this paper introduces a new
algorithm for sort-based duplicate removal, grouping, and aggregation. The new
algorithm always performs at least as well as both traditional hash-based and
traditional sort-based algorithms. It can serve as a system's only aggregation
algorithm for unsorted inputs, thus preventing erroneous algorithm choices.
Furthermore, the new algorithm produces sorted output that can speed up
subsequent operations. Google's F1 Query uses the new algorithm in production
workloads that aggregate petabytes of data every day
Effiziente Laufzeitsysteme für Datenlager
Aktuelle DBMS sind für OLTP-Anwendungen optimiert. Die Anforderungen von OLAP- und OLTP-Anwendungen an das DBMS unterscheiden sich erheblich. Wir habe einige dieser Unterschiede identifiziert und ein Laufzeitsystem entwickelt, das diese Unterschiede ausnutzt, um die Leistung für OLAP-Anwendungen zu verbessern. Die entwickelten Techniken beinhalten (1) die Verwendung einer virtuellen Maschine zur Auswertung von Ausdrücken, (2) die effiziente Integration von Kompression und (3) spezifische algebraische Operatoren. Unsere Evaluierung hat ergeben, daß die Verwendung dieser Techniken signifikante (Faktor 2 oder mehr) Leistungssteigerungen ermöglicht
Engineering Aggregation Operators for Relational In-Memory Database Systems
In this thesis we study the design and implementation of Aggregation operators in the context of relational in-memory database systems. In particular, we identify and address the following challenges: cache-efficiency, CPU-friendliness, parallelism within and across processors, robust handling of skewed data, adaptive processing, processing with constrained memory, and integration with modern database architectures. Our resulting algorithm outperforms the state-of-the-art by up to 3.7x
Efficient Generation and Execution of DAG-Structured Query Graphs
Traditional database management systems use tree-structured query evaluation plans. While easy to implement, a tree-structured query evaluation plan is not expressive enough for some optimizations like factoring common algebraic subexpressions or magic sets. These require directed acyclic graphs (DAGs), i.e. shared subplans. This work covers the different aspects of DAG-structured query graphs. First, it introduces a novel framework to reason about sharing of subplans and thus DAG-structured query evaluation plans. Second, it describes the first plan generator capable of generating optimal DAG-structured query evaluation plans. Third, an efficient framework for reasoning about orderings and groupings used by the plan generator is presented. And fourth, a runtime system capable of executing DAG-structured query evaluation plans with minimal overhead is discussed. The experimental results show that with no or only a modest increase of plan generation time, a major reduction of query execution time can be achieved for common queries. This shows that DAG-structured query evaluation plans are serviceable and should be preferred over tree-structured query plans