2,835 research outputs found
Collaborative Reuse of Streaming Dataflows in IoT Applications
Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark
Streaming enable composition of continuous dataflows that execute persistently
over data streams. They are used by Internet of Things (IoT) applications to
analyze sensor data from Smart City cyber-infrastructure, and make active
utility management decisions. As the ecosystem of such IoT applications that
leverage shared urban sensor streams continue to grow, applications will
perform duplicate pre-processing and analytics tasks. This offers the
opportunity to collaboratively reuse the outputs of overlapping dataflows,
thereby improving the resource efficiency. In this paper, we propose
\emph{dataflow reuse algorithms} that given a submitted dataflow, identifies
the intersection of reusable tasks and streams from a collection of running
dataflows to form a \emph{merged dataflow}. Similar algorithms to unmerge
dataflows when they are removed are also proposed. We implement these
algorithms for the popular Apache Storm DSPS, and validate their performance
and resource savings for 35 synthetic dataflows based on public OPMW workflows
with diverse arrival and departure distributions, and on 21 real IoT dataflows
from RIoTBench.Comment: To appear in IEEE eScience Conference 201
- …