Selecting Efficient Cluster Resources for Data Analytics: When and How
  to Allocate for In-Memory Processing?

Kao, Odej; Scheinert, Dominik; Thamsen, Lauritz; Will, Jonathan

Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

Authors: Odej Kao
Dominik Scheinert
Lauritz Thamsen
Jonathan Will
Publication date: 6 June 2023
Publisher

Abstract

Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration. In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in-memory processing with in-memory data processing frameworks can undermine resource efficiency. Based on the findings of our trace data analysis, we compile requirements towards an automated solution for efficient cluster resource allocation.Comment: 4 pages, 3 Figures; ACM SSDBM 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.03672

Last time updated on 08/06/2023