Search CORE

339 research outputs found

Proactive Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems

Author: Anceaume Emmanuelle
Busnel Yann
Querzoni Leonardo
Rivetti Nicoló
Sericola Bruno
Publication venue: HAL CCSD
Publication date: 19/12/2015
Field of study

Shuffle grouping is a technique used by stream processing frameworks to share input load among parallel instances of stateless operators. With shuffle grouping each tuple of a stream can be assigned to any available operator instance, independently from any previous assignment. A common approach to implement shuffle grouping is to adopt a round robin policy, a simple solution that fares well as long as the tuple execution time is constant. However, such assumption rarely holds in real cases where execution time strongly depends on tuple content. As a consequence, parallel stateless operators within stream processing applications may experience unpredictable unbalance that, in the end, causes undesirable increase in tuple completion times. In this paper we propose Proactive Online Shuffle Grouping (POSG), a novel approach to shuffle grouping aimed at reducing the overall tuple completion time. POSG estimates the execution time of each tuple, enabling a proactive and online scheduling of input load to the target operator instances. Sketches are used to efficiently store the otherwise large amount of information required to schedule incoming load. We provide a probabilistic analysis and illustrate, through both simulations and a running prototype, its impact on stream processing applications

Proactive Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems

Author: Anceaume Emmanuelle
Busnel Yann
Querzoni Leonardo
Rivetti Nicoló
Sericola Bruno
Publication venue: HAL CCSD
Publication date: 19/12/2015
Field of study

INRIA a CCSD electronic archive server

Model-driven Scheduling for Distributed Stream Processing Systems

Author: Shukla Anshu
Simmhan Yogesh
Publication venue: 'Elsevier BV'
Publication date: 06/02/2017
Field of study

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

arXiv.org e-Print Archive

Adaptive Optimizations for Stream-based Workflows

Author: Filgueira Rosa
Liang Liang
Yan Yan
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/11/2020
Field of study