Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Doi
Abstract
This experimental study presents several overlooked issues that pose a challenge for data analytics configuration tuning and deployment. These issues include: 1) the assumption of static workload/environment ignoring the dynamic characteristics of the analytics environment (e.g. the frequent need for workload retuning). 2) the speed of tuning cost amortization and how this influences the tuning decision. 3) the need for a comprehensive incremental tuning for a diverse set of workloads.
To prove our point, we present Tuneful, an efficient configuration tuning framework for data analytics. We show how it is designed to overcome the above issues and illustrate its applicability by experimenting with it on two cloud service providers