Search CORE

10 research outputs found

TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

Author: Boncz P.A. (Peter)
Erling O. (Orri)
Neumann T. (Thomas)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which w

VU Research Portal

CWI's Institutional Repository

Functional Collection Programming with Semi-Ring Dictionaries

Author: Huot Mathieu
Olteanu Dan
Shaikhha Amir
Smith Jaclyn
Publication venue
Publication date: 13/10/2021
Field of study

This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. Overall, we observe that SDQL achieves competitive performance relative to Typer and Tectorwise, which are state-of-the-art in-memory DB systems for (flat, not nested) relational data, and achieves an average 2x speedup over SciPy for LA workloads. For hybrid workloads involving LA processing, SDQL achieves up to one order of magnitude speedup over Trance, a state-of-the-art nested relational engine for nested biomedical data, and gives an average 40% speedup over LMFAO, a state-of-the-art in-DB machine learning engine for two (flat) relational real-world retail datasets

arXiv.org e-Print Archive

Edinburgh Research Explorer

Accelerating Queries with Group-By and Join by Groupjoin

Author: Moerkotte Guido
Neumann Thomas
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2012
Field of study

MAnnheim DOCument Server

Accelerating Queries with Group-By and Join by Groupjoin

Author: Moerkotte Guido
Neumann Thomas
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2011
Field of study

Most aggregation queries contain both group-by and join operators, and spend a significant amount of time evaluating these two expensive operators. Merging them into one operator (the groupjoin) significantly speeds up query execution. We introduce two main equivalences to allow for the merging and prove their correctness. Furthermore, we show experimentally that these equivalences can significantly speed up TPC-H

CiteSeerX

MAnnheim DOCument Server

Accelerating queries with group-by and join by groupjoin

Author: Bellakonda S.
Cattell R.
Cluet S.
Cluet S.
DeWitt D.
Tsois A.
von Bültzingsloewen G.
Yan W.
Yan W.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Extending dynamic-programming-based plan generators: beyond pure enumeration

Author: Eich Marius
Publication venue
Publication date: 01/01/2017
Field of study

The query optimizer plays an important role in a database management system supporting a declarative query language, such as SQL. One of its central components is the plan generator, which is responsible for determining the optimal join order of a query. Plan generators based on dynamic programming have been known for several decades. However, some significant progress in this field has only been made recently. This includes the emergence of highly efficient enumeration algorithms and the ability to optimize a wide range of queries by supporting complex join predicates. This thesis builds upon the recent advancements by providing a framework for extending the aforementioned algorithms. To this end, a modular design is proposed that allows for the exchange of individual parts of the plan generator, thus enabling the implementor to add new features at will. This is demonstrated by taking the example of two previously unsolved problems, namely the correct and complete reordering of different types of join operators as well as the efficient reordering of join operators and grouping operators

MAnnheim DOCument Server

Forschungsbericht Universität Mannheim 2010 / 2011

Author
Publication venue
Publication date: 01/01/2012
Field of study

Der Forschungsbericht bietet Ihnen eine Übersicht über die Forschungsschwerpunkte der Fakultäten, Abteilungen und Forschungseinrichtungen der Universität Mannheim. Dazu enthält der vorliegende Forschungsbericht Informationen über Einzelprojekte in den jeweiligen Fachdisziplinen sowie über zumeist fächerübergreifende Verbundprojekte wie Sonderforschungsbereiche, Forschergruppen, Wissenschaftscampi, Graduiertenschulen und Promotionskollegs. Die aus den Forschungsaktivitäten hervorgegangenen Publikationen, die Sie in diesem Bericht aufgelistet finden, leisten wichtige Beiträge zum wissenschaftlichen Fortschritt innerhalb der Disziplinen. Die ebenfalls aufgeführten Transferleistungen stellen Beiträge der Grundlagenwissenschaft zur Lösung gesellschaftlicher und wirtschaftlicher Herausforderungen dar. Nicht zuletzt enthält der Forschungsbericht Angaben zu wissenschaftlichen Preisen und Auszeichnungen, zu Veranstaltungen und Tagungen sowie zu akademischen Qualifikationen im Sinne von Promotionen und Habilitationen. Diese Angaben reflektieren die Reputation der Wissenschaftlerinnen und Wissenschaftler und ergänzen die sonstigen forschungsbezogenen Leistungen an der Universität Mannheim

MAnnheim DOCument Server

Compilation and Code Optimization for Data Analytics

Author: Shaikhha Amir
Publication venue: Lausanne, EPFL
Publication date: 29/08/2018
Field of study

The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code

Infoscience - École polytechnique fédérale de Lausanne

Compilation and Code Optimization for Data Analytics

Author: Abdullah Abdul Rahim
Kasim Rizanaliah
Mohamad Basir Muhammad Sufyan Safwan
Ramli Mohd Zulkifli
Selamat Nur Asmiza
Publication venue: Lausanne, EPFL
Publication date: 01/03/2016
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Universiti Teknikal Malaysia Melaka (UTeM) Repository