3,032 research outputs found
In-Network Outlier Detection in Wireless Sensor Networks
To address the problem of unsupervised outlier detection in wireless sensor
networks, we develop an approach that (1) is flexible with respect to the
outlier definition, (2) computes the result in-network to reduce both bandwidth
and energy usage,(3) only uses single hop communication thus permitting very
simple node failure detection and message reliability assurance mechanisms
(e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data.
We examine performance using simulation with real sensor data streams. Our
results demonstrate that our approach is accurate and imposes a reasonable
communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on
Distributed Computing Systems 200
Boosting the Basic Counting on Distributed Streams
We revisit the classic basic counting problem in the distributed streaming
model that was studied by Gibbons and Tirthapura (GT). In the solution for
maintaining an -estimate, as what GT's method does, we make
the following new contributions: (1) For a bit stream of size , where each
bit has a probability at least to be 1, we exponentially reduced the
average total processing time from GT's to
, thus providing the first
sublinear-time streaming algorithm for this problem. (2) In addition to an
overall much faster processing speed, our method provides a new tradeoff that a
lower accuracy demand (a larger value for ) promises a faster
processing speed, whereas GT's processing speed is
in any case and for any . (3) The worst-case total time cost of our
method matches GT's , which is necessary but rarely
occurs in our method. (4) The space usage overhead in our method is a lower
order term compared with GT's space usage and occurs only times
during the stream processing and is too negligible to be detected by the
operating system in practice. We further validate these solid theoretical
results with experiments on both real-world and synthetic data, showing that
our method is faster than GT's by a factor of several to several thousands
depending on the stream size and accuracy demands, without any detectable space
usage overhead. Our method is based on a faster sampling technique that we
design for boosting GT's method and we believe this technique can be of other
interest.Comment: 32 page
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Distributed Stream Processing Systems (DSPSs) are among the currently most
emerging topics in data management, with applications ranging from real-time
event monitoring to processing complex dataflow programs and big data
analytics. The major market players in this domain are clearly represented by
Apache Spark and Flink, which provide a variety of frontend APIs for SQL,
statistical inference, machine learning, stream processing, and many others.
Yet rather few details are reported on the integration of these engines into
the underlying High-Performance Computing (HPC) infrastructure and the
communication protocols they use. Spark and Flink, for example, are implemented
in Java and still rely on a dedicated master node for managing their control
flow among the worker nodes in a compute cluster.
In this paper, we describe the architecture of our AIR engine, which is
designed from scratch in C++ using the Message Passing Interface (MPI),
pthreads for multithreading, and is directly deployed on top of a common HPC
workload manager such as SLURM. AIR implements a light-weight, dynamic sharding
protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a
direct and asynchronous communication among all client nodes and thereby
completely avoids the overhead induced by the control flow with a master node
that may otherwise form a performance bottleneck. Our experiments over a
variety of benchmark settings confirm that AIR outperforms Spark and Flink in
terms of latency and throughput by a factor of up to 15; moreover, we
demonstrate that AIR scales out much better than existing DSPSs to clusters
consisting of up to 8 nodes and 224 cores.Comment: 16 pages, 6 figures, 15 plot
- …