ATUN-HL: auto tuning of hybrid layouts using workload and data characteristics

Abelló Gamazo, Alberto; Lehner, Wolfgang; Munir, Rana Faisal; Romero Moral, Óscar; Thiele, Maik

ATUN-HL: auto tuning of hybrid layouts using workload and data characteristics

Authors: Alberto Abelló Gamazo
Wolfgang Lehner
Rana Faisal Munir
Óscar Romero Moral
Maik Thiele
Publication date: 1 January 2018
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Ad-hoc analysis implies processing data in near real-time. Thus, raw data (i.e., neither normalized nor transformed) is typically dumped into a distributed engine, where it is generally stored into a hybrid layout. Hybrid layouts divide data into horizontal partitions and inside each partition, data are stored vertically. They keep statistics for each horizontal partition and also support encoding (i.e., dictionary) and compression to reduce the size of the data. Their built-in support for many ad-hoc operations (i.e., selection, projection, aggregation, etc.) makes hybrid layouts the best choice for most operations. Horizontal partition and dictionary sizes of hybrid layouts are configurable and can directly impact the performance of analytical queries. Hence, their default configuration cannot be expected to be optimal for all scenarios. In this paper, we present ATUN-HL (Auto TUNing Hybrid Layouts), which based on a cost model and given the workload and the characteristics of data, finds the best values for these parameters. We prototyped ATUN-HL for Apache Parquet, which is an open source implementation of hybrid layouts in Hadoop Distributed File System, to show its effectiveness. Our experimental evaluation shows that ATUN-HL provides on average 85% of all the potential performance improvement, and 1.2x average speedup against default configuration.Peer ReviewedPostprint (published version

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons

oai:upcommons.upc.edu:2117/132...

Last time updated on 17/04/2020

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/132...

Last time updated on 15/05/2019