An energy efficient and runtime-aware framework for distributed stream computing systems

Abstract

Task scheduling in distributed stream computing systems is an NP-complete problem. Current scheduling schemes usually have a pause or slow start process due to the fluctuation of input data stream, which affects the performance stability, especially the high throughput and low latency goals. In addition, idle compute nodes at runtime may result in large idle load energy consumption. To address these problems, we propose an energy efficient and runtime-aware framework (Er-Stream). This paper thoroughly discusses the framework from the following aspects: (1) The communication between real-time data streaming tasks is investigated; stream application, resource and energy consumption are modeled to formalize the scheduling problem. (2) After an initial topology is submitted to the cluster, task pairs with high communication cost are processed on the same compute node through a lightweight task partitioning strategy, minimizing the communication cost between nodes and avoiding frequent triggering of runtime scheduling. (3) At runtime, reliable task migration is performed based on node communication and resource usage, which in turn helps the dynamic adjustment of the node energy consumption. (4) Metrics including latency, throughput, resource load and energy consumption are evaluated in a real distributed stream computing environment. With a comprehensive evaluation of variable-rate input scenarios, the proposed Er-Stream system provides promising improvements on throughput, latency and energy consumption compared to the existing Storm's scheduling strategies

    Similar works

    Full text

    thumbnail-image

    Available Versions