Search CORE

26 research outputs found

Scheduling Storms and Streams in the Cloud

Author: Ghaderi Javad
Shakkottai Sanjay
Srikant R
Publication venue
Publication date: 20/02/2015
Field of study

Motivated by emerging big streaming data processing paradigms (e.g., Twitter Storm, Streaming MapReduce), we investigate the problem of scheduling graphs over a large cluster of servers. Each graph is a job, where nodes represent compute tasks and edges indicate data-flows between these compute tasks. Jobs (graphs) arrive randomly over time, and upon completion, leave the system. When a job arrives, the scheduler needs to partition the graph and distribute it over the servers to satisfy load balancing and cost considerations. Specifically, neighboring compute tasks in the graph that are mapped to different servers incur load on the network; thus a mapping of the jobs among the servers incurs a cost that is proportional to the number of "broken edges". We propose a low complexity randomized scheduling algorithm that, without service preemptions, stabilizes the system with graph arrivals/departures; more importantly, it allows a smooth trade-off between minimizing average partitioning cost and average queue lengths. Interestingly, to avoid service preemptions, our approach does not rely on a Gibbs sampler; instead, we show that the corresponding limiting invariant measure has an interpretation stemming from a loss system.Comment: 14 page

arXiv.org e-Print Archive

Crossref

Patterns for distributed real-time stream processing

Author: Arias Fisteus Jesús
Basanta Val Pablo
Fernández García Norberto
Sánchez Fernández Luis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In recent years, big data systems have become an active area of research and development. Stream processing is one of the potential application scenarios of big data systems where the goal is to process a continuous, high velocity flow of information items. High frequency trading (HFT) in stock markets or trending topic detection in Twitter are some examples of stream processing applications. In some cases (like, for instance, in HFT), these applications have end-to-end quality-of-service requirements and may benefit from the usage of real-time techniques. Taking this into account, the present article analyzes, from the point of view of real-time systems, a set of patterns that can be used when implementing a stream processing application. For each pattern, we discuss its advantages and disadvantages, as well as its impact in application performance, measured as response time, maximum input frequency and changes in utilization demands due to the pattern.This work been partially supported by Distributed Java Infrastructure for Real-Time Big Data (CAS14/00118). It has been also partially funded by eMadrid (S2013/ICE-2715), HERMES-MARTDRIVER (TIN2013-46801-C4-2-R) and AUDACity (TIN2016-77158-C4-1-R); and also by European Union's 7th Framework Program under Grant Agreement FP7-IC6-318763. We are also in debt with our anonymous reviewers that improved the quality of the article

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

Author: He Bingsheng
He Jiong
Zhang Shuhao
Zhou Amelie Chi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/04/2019
Field of study

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Model-driven Scheduling for Distributed Stream Processing Systems

Author: Shukla Anshu
Simmhan Yogesh
Publication venue: 'Elsevier BV'
Publication date: 06/02/2017
Field of study

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

TCEP:Adapting to Dynamic User Environments by Enabling Transitions between Operator Placement Mechanisms

Author: Arif Raheel
Koldehofe Boris
Luthra Manisha
Salvaneschi Guido
Weisenburger Pascal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Operator placement has a profound impact on the performance of a distributed complex event processing system (DCEP). Since the behavior of a placement mechanism strongly depends on its environment; a single placement mechanism is often not enough to fulfill stringent performance requirements under environmental changes. In this paper, we show how DCEP can benefit from the adaptive use of multiple placement mechanisms. We propose Tcep, a DCEP system to integrate multiple placement mechanisms. By enabling transitions, Tcep can seamlessly exchange distinct operator mechanisms at runtime. We make two main contributions that are highly important for a cost-efficient transition: i) a transition strategy for efficiently scheduling state migrations and ii) a lightweight learning algorithm to adaptively select an appropriate placement mechanism as a consequence of a transition. Our evaluations for important decentralized placement mechanisms in the context of an IoT scenario show that transitions can better fulfill QoS demands in a dynamic environment. Thereby efficient scheduling of state migrations can help to faster complete transitions by up to 94 %

Crossref

Proceedings - University of Groningen

University of Groningen

Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications

Author: Bender Michael A.
Carbone Paris
Muthukrishnan S.
Pham T. N.
Sharaf M. A.
Urhan Tolga
Wolf Joel
Xing Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In modern Stream Processing Engines (SPEs), numerous diverse applications, which can differ in aspects such as cost, criticality or latency sensitivity, can co-exist in the same computing node. When these differences need to be considered to control the performance of each application, custom scheduling of operators to threads is of key importance (e.g., when a smart vehicle needs to ensure that safety-critical applications always have access to computational power, while other applications are given lower, variable priorities).Many solutions have been proposed regarding schedulers that allocate threads to operators to optimize specific metrics (e.g., latency) but there is still lack of a tool that allows arbitrarily complex scheduling strategies to be seamlessly plugged on top of an SPE. We propose Haren to fill this gap. More specifically, we (1) formalize the thread scheduling problem in stream processing in a general way, allowing to define ad-hoc scheduling policies, (2) identify the bottlenecks and the opportunities of scheduling in stream processing, (3) distill a compact interface to connect Haren with SPEs, enabling rapid testing of various scheduling policies, (4) illustrate the usability of the framework by integrating it into an actual SPE and (5) provide a thorough evaluation. As we show, Haren makes it is possible to adapt the use of computational resources over time to meet the goals of a variety of scheduling policies

Crossref

Chalmers Research