8,406 research outputs found
Collaborative Reuse of Streaming Dataflows in IoT Applications
Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark
Streaming enable composition of continuous dataflows that execute persistently
over data streams. They are used by Internet of Things (IoT) applications to
analyze sensor data from Smart City cyber-infrastructure, and make active
utility management decisions. As the ecosystem of such IoT applications that
leverage shared urban sensor streams continue to grow, applications will
perform duplicate pre-processing and analytics tasks. This offers the
opportunity to collaboratively reuse the outputs of overlapping dataflows,
thereby improving the resource efficiency. In this paper, we propose
\emph{dataflow reuse algorithms} that given a submitted dataflow, identifies
the intersection of reusable tasks and streams from a collection of running
dataflows to form a \emph{merged dataflow}. Similar algorithms to unmerge
dataflows when they are removed are also proposed. We implement these
algorithms for the popular Apache Storm DSPS, and validate their performance
and resource savings for 35 synthetic dataflows based on public OPMW workflows
with diverse arrival and departure distributions, and on 21 real IoT dataflows
from RIoTBench.Comment: To appear in IEEE eScience Conference 201
A dataflow platform for applications based on Linked Data
Modern software applications increasingly benefit from accessing the multifarious and heterogeneous Web of Data, thanks to the use of web APIs and Linked Data principles. In previous work, the authors proposed a platform to develop applications consuming Linked Data in a declarative and modular way. This paper describes in detail the functional language the platform gives access to, which is based on SPARQL (the standard query language for Linked Data) and on the dataflow paradigm. The language features interactive and meta-programming capabilities so that complex modules/applications can be developed. By adopting a declarative style, it favours the development of modules that can be reused in various specific execution context
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
Dynamic resource allocation in a hierarchical multiprocessor system: A preliminary study
An integrated system approach to dynamic resource allocation is proposed. Some of the problems in dynamic resource allocation and the relationship of these problems to system structures are examined. A general dynamic resource allocation scheme is presented. A hierarchial system architecture which dynamically maps between processor structure and programs at multiple levels of instantiations is described. Simulation experiments were conducted to study dynamic resource allocation on the proposed system. Preliminary evaluation based on simple dynamic resource allocation algorithms indicates that with the proposed system approach, the complexity of dynamic resource management could be significantly reduced while achieving reasonable effective dynamic resource allocation
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
- …