Search CORE

9,528 research outputs found

Curracurrong: a stream processing system for distributed environments

Author: Kersten R.W.J.
Person S.
Rungta N.
Tkachuk O.
Publication venue: School of Information Technologies
Publication date: 01/01/2015
Field of study

Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay

NASA Technical Reports Server

Sydney eScholarship

Curracurrong: a stream processing system for distributed environments

Author: Kakkad Vasvi
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

Sydney eScholarship

SQPR: Stream Query Planning with Reuse

Author: Kalyvianaki E
Kuhn D
Pietzuch P
Vu QH
Wiesemann W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads

Crossref

City Research Online

Spiral - Imperial College Digital Repository

Network-Aware Stream Query Processing in Mobile Ad-Hoc Networks

Author: O'Keeffe D
Pietzuch PR
Salonidis T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/07/2015
Field of study

Crossref

Spiral - Imperial College Digital Repository

Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

Author: Cao Jianneng
Madsen Kasper Grud Skat
Zhou Yongluan
Publication venue
Publication date: 11/02/2016
Field of study

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches

arXiv.org e-Print Archive

Crossref

University of Southern Denmark Research Output

Optimal Embedding of Functions for In-Network Computation: Complexity Analysis and Algorithms

Author: Limaye Nutan
Manjunath D.
Vyavahare Pooja
Publication venue
Publication date: 14/07/2015
Field of study

We consider optimal distributed computation of a given function of distributed data. The input (data) nodes and the sink node that receives the function form a connected network that is described by an undirected weighted network graph. The algorithm to compute the given function is described by a weighted directed acyclic graph and is called the computation graph. An embedding defines the computation communication sequence that obtains the function at the sink. Two kinds of optimal embeddings are sought, the embedding that---(1)~minimizes delay in obtaining function at sink, and (2)~minimizes cost of one instance of computation of function. This abstraction is motivated by three applications---in-network computation over sensor networks, operator placement in distributed databases, and module placement in distributed computing. We first show that obtaining minimum-delay and minimum-cost embeddings are both NP-complete problems and that cost minimization is actually MAX SNP-hard. Next, we consider specific forms of the computation graph for which polynomial time solutions are possible. When the computation graph is a tree, a polynomial time algorithm to obtain the minimum delay embedding is described. Next, for the case when the function is described by a layered graph we describe an algorithm that obtains the minimum cost embedding in polynomial time. This algorithm can also be used to obtain an approximation for delay minimization. We then consider bounded treewidth computation graphs and give an algorithm to obtain the minimum cost embedding in polynomial time

arXiv.org e-Print Archive

Dspace at IIT Bombay

Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

Author: Ghosh Rajrup
Komma Siva Prakash Reddy
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2018
Field of study

The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref