Search CORE

6,668 research outputs found

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

Author: He Bingsheng
He Jiong
Zhang Shuhao
Zhou Amelie Chi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/04/2019
Field of study

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

Author: Su Li
Zhou Yongluan
Publication venue
Publication date: 04/02/2016
Field of study

Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach

arXiv.org e-Print Archive

University of Southern Denmark Research Output

When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Processing

Author: Kourtellis Nicolas
Morales Gianmarco De Francisci
Nasir Muhammad Anis Uddin
Serafini Marco
Publication venue
Publication date: 27/01/2016
Field of study

Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated to keys which are significantly more frequent than others. Skew is remarkably more problematic in large deployments: more workers implies fewer keys per worker, so it becomes harder to "average out" the cost of hot keys with cold keys. We propose a novel load balancing technique that uses a heaving hitter algorithm to efficiently identify the hottest keys in the stream. These hot keys are assigned to

d \geq 2

choices to ensure a balanced load, where

d

is tuned automatically to minimize the memory and computation cost of operator replication. The technique works online and does not require the use of routing tables. Our extensive evaluation shows that our technique can balance real-world workloads on large deployments, and improve throughput and latency by

\mathbf{150\%}

and

\mathbf{60\%}

respectively over the previous state-of-the-art when deployed on Apache Storm.Comment: 12 pages, 14 Figures, this paper is accepted and will be published at ICDE 201

arXiv.org e-Print Archive

Crossref

Modeling and Evaluation of Multisource Streaming Strategies in P2P VoD Systems

Author: Leonardi Emilio
Munoz Gea J.P.
Traverso Stefano
Publication venue: IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: [email protected], INTERNET: http://www.ieee.org, Fax: (732)981-9667
Publication date: 01/01/2012
Field of study

In recent years, multimedia content distribution has largely been moved to the Internet, inducing broadcasters, operators and service providers to upgrade with large expenses their infrastructures. In this context, streaming solutions that rely on user devices such as set-top boxes (STBs) to offload dedicated streaming servers are particularly appropriate. In these systems, contents are usually replicated and scattered over the network established by STBs placed at users' home, and the video-on-demand (VoD) service is provisioned through streaming sessions established among neighboring STBs following a Peer-to-Peer fashion. Up to now the majority of research works have focused on the design and optimization of content replicas mechanisms to minimize server costs. The optimization of replicas mechanisms has been typically performed either considering very crude system performance indicators or analyzing asymptotic behavior. In this work, instead, we propose an analytical model that complements previous works providing fairly accurate predictions of system performance (i.e., blocking probability). Our model turns out to be a highly scalable, flexible, and extensible tool that may be helpful both for designers and developers to efficiently predict the effect of system design choices in large scale STB-VoD system

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

Author: Ghosh Rajrup
Komma Siva Prakash Reddy
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2018
Field of study

The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref

A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing

Author: Chen Jiaqing
Chen Xing
Lin Bing
Mauri Jaime Lloret
Xiong Neal N.
Zhang Jianshan
Zhu Fangning
Publication venue
Publication date: 24/01/2019
Field of study

Compared to traditional distributed computing environments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different datacenters from the cloud computing environment, resulting in serious data transmission delays. Edge computing reduces the data transmission delays and supports the fixed storing manner for scientific workflow private datasets, but there is a bottleneck in its storage capacity. It is a challenge to combine the advantages of both edge computing and cloud computing to rationalize the data placement of scientific workflow, and optimize the data transmission time across different datacenters. Traditional data placement strategies maintain load balancing with a given number of datacenters, which results in a large data transmission time. In this study, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow. This approach considered the characteristics of data placement combining edge computing and cloud computing. In addition, it considered the impact factors impacting transmission delay, such as the band-width between datacenters, the number of edge datacenters, and the storage capacity of edge datacenters. The crossover operator and mutation operator of the genetic algorithm were adopted to avoid the premature convergence of the traditional particle swarm optimization algorithm, which enhanced the diversity of population evolution and effectively reduced the data transmission time. The experimental results show that the data placement strategy based on GA-DPSO can effectively reduce the data transmission time during workflow execution combining edge computing and cloud computing

arXiv.org e-Print Archive

RiuNet

S+Net: extending functional coordination with extra-functional semantics

Author: Grelck Clemens
Kirner Raimund
Penczek Frank
Poss Raphael
Shafarenko Alex
Verstraaten Merijn
Publication venue
Publication date: 01/01/2013
Field of study

This technical report introduces S+Net, a compositional coordination language for streaming networks with extra-functional semantics. Compositionality simplifies the specification of complex parallel and distributed applications; extra-functional semantics allow the application designer to reason about and control resource usage, performance and fault handling. The key feature of S+Net is that functional and extra-functional semantics are defined orthogonally from each other. S+Net can be seen as a simultaneous simplification and extension of the existing coordination language S-Net, that gives control of extra-functional behavior to the S-Net programmer. S+Net can also be seen as a transitional research step between S-Net and AstraKahn, another coordination language currently being designed at the University of Hertfordshire. In contrast with AstraKahn which constitutes a re-design from the ground up, S+Net preserves the basic operational semantics of S-Net and thus provides an incremental introduction of extra-functional control in an existing language.Comment: 34 pages, 11 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications