research

Adaptive Modelling and Control in Distributed Systems

Abstract

International audienceCompanies have growing amounts of data to store and to process. In response to these new processing challenges, Google developed MapReduce, a parallel programming paradigm which is becoming the major tool for BigData treatment. Even if MapReduce is used by most IT companies, ensuring its performances while minimizing costs is a real challenge requiring a high level of expertise. Modelling and control of MapReduce have been developed in the last years, however there are still many problems caused by the software's high variability. To tackle the latter issue, this paper proposes an on-line model estimation algorithm for MapReduce systems. An adaptive control strategy is developed and implemented to guarantee response time performances under a concurrent workload while minimizing resource use. Results have been validated using a 40 nodes MapReduce cluster under a data intensive Business Intelligence workload running on Grid5000, a French national cloud. The experiments show that the adaptive control algorithm manages to guarantee performances and low costs even in a highly variable environment

    Similar works

    Full text

    thumbnail-image

    Available Versions