A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Chen, Yuxing; Herodotou, Herodotos; Lu, Jiaheng

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Authors: Yuxing Chen
Herodotos Herodotou
Jiaheng Lu
Publication date: 1 April 2020
Publisher
Doi

Abstract

Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Ktisis

oai:ktisis.cut.ac.cy:20.500.14...

Last time updated on 17/03/2023

Helsingin yliopiston digitaalinen arkisto

oai:helda.helsinki.fi:10138/31...

Last time updated on 04/09/2020