Search CORE

4 research outputs found

Recommended from our members

Optimizing Data-Intensive Computing with Efficient Configuration Tuning

Author: Fekry Ayat
Publication venue: University of Cambridge
Publication date: 30/07/2021
Field of study

As the complexity of distributed analytics systems evolves over time, more configuration parameters get exposed for tuning. While these numerous parameters allow users more control over how their workloads are executed, this flexibility comes at a cost, since finding the right configurations for such systems in a cost-effective way becomes challenging. In practice, several factors contribute to the complexity of tuning the configuration of those systems: the large configuration space, the diversity of the served workloads (each workload possibly requiring a different resource allocation strategy to run optimally), and the dynamic characteristics of these systems’ environment (e.g., increase in input data size, changes in the allocation of resources). Paradoxically, existing solutions for workload tuning either assume static tuning environment or workloads that are inexpensive to run (i.e. requiring hundreds of execution samples). Recently, Bayesian Optimisation (BO) strategies have been applied as a solution to enable efficient autotuning. They build a probabilistic model incrementally to predict the impact of the parameters on performance using a small number of execution samples. The incrementally constructed BO model is used to guide the tuning process and accelerate convergence to a near-optimal configuration. Unfortunately, for distributed analytics systems, the configuration space is too large to construct a good model using traditional BO, which fails to provide quick convergence in high dimensional configuration space. I argue that cost-effective tuning strategies can only be developed when taking into account: the frequent changes that can happen in the analytics workload/environment, the amortization of tuning costs and how this influences tuning profitability, the high dimensionality of configuration space and the need to cater for diverse workloads. To tackle these challenges, I propose Tuneful, an efficient configuration tuning framework for such expensive to tune systems. It works efficiently both initially (when little data is available) as well as later (as more tuning knowledge is acquired). It starts with learning workload-specific influential parameters incrementally and tunes those only, then when more tuning knowledge becomes available, it detects similarity across workloads and utilizes multitask BO to share the tuning knowledge across similar workloads. I show how augmenting the BO approach with parameters’ significance and workload similarity characteristics enables an efficient configuration tuning in high dimensional configuration space. Over diverse analytics workloads, this significantly accelerates both configuration tuning and cost amortization, saving search time by 2.7-3.7X at median compared to the-state-of-the-art approaches

Apollo (Cambridge)

To Tune or Not to Tune? In Search of Optimal Configurations for Data Analytics

Author: Carata Lucian
Fekry Ayat
Hopper Andy
Pasquier Thomas
Rice Andrew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Crossref

Explore Bristol Research

Towards Seamless Configuration Tuning of Big Data Analytics

Author: Carata Lucian
Fekry Ayat
Hopper Andy
Pasquier Thomas
Rice Andrew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Explore Bristol Research