A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous
  Cluster

Divband, Arman; Goudarzi, Maziar; Nasehi, Saeed; Nasiri, Hamid

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

Authors: Arman Divband
Maziar Goudarzi
Saeed Nasehi
Hamid Nasiri
Publication date: 28 January 2020
Publisher

Abstract

In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall throughput and resource utilization; while the simplicity of current DSPFs' schedulers leads these frameworks to perform poorly on large-scale clusters. In this paper, we present the design and implementation of a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We start to scale up the application graph over a given cluster gradually, by increasing the topology input rate and taking new instances from bottlenecked vertices. Our experimental results on Storm Micro-Benchmark show that 1) the prediction model estimate CPU utilization with 92% accuracy. 2) Compared to default scheduler of Storm, our scheduler provides 7% to 44% throughput enhancement. 3) The proposed method can find the solution within 4% (worst case) of the optimal scheduler which obtains the best scheduling scenario using an exhaustive search on problem design space

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2001.10308

Last time updated on 12/10/2020