4,560 research outputs found
Continuous Monitoring of Top-K Queries over Sliding Windows
Given a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous long-running queries. This paper studies continuous monitoring of top-k queries over a fixed-size window W of the most recent data. The window size can be expressed either in terms of the number of active tuples or time units. We propose a general methodology for top-k monitoring that restricts processing to the sub-domains of the workspace that influence the result of some query. To cope with high stream rates and provide fast answers in an on-line fashion, the data in W reside in main memory. The valid records are indexed by a grid structure, which also maintains book-keeping information. We present two processing techniques: the first one computes the new answer of a query whenever some of the current top-k points expire; the second one partially pre-computes the future changes in the result, achieving better running time at the expense of slightly higher space requirements. We analyze the performance of both algorithms and evaluate their efficiency through extensive experiments. Finally, we extend the proposed framework to other query types and a different data stream model. Copyright 2006 ACM
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window
The past decade has witnessed many interesting algorithms for maintaining
statistics over a data stream. This paper initiates a theoretical study of
algorithms for monitoring distributed data streams over a time-based sliding
window (which contains a variable number of items and possibly out-of-order
items). The concern is how to minimize the communication between individual
streams and the root, while allowing the root, at any time, to be able to
report the global statistics of all streams within a given error bound. This
paper presents communication-efficient algorithms for three classical
statistics, namely, basic counting, frequent items and quantiles. The
worst-case communication cost over a window is bits for basic counting and words for the remainings, where is the number of distributed
data streams, is the total number of items in the streams that arrive or
expire in the window, and is the desired error bound. Matching
and nearly matching lower bounds are also obtained.Comment: 12 pages, to appear in the 27th International Symposium on
Theoretical Aspects of Computer Science (STACS), 201
SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects
With the proliferation of mobile devices and location-based services,
continuous generation of massive volume of streaming spatial objects (i.e.,
geo-tagged data) opens up new opportunities to address real-world problems by
analyzing them. In this paper, we present a novel continuous bursty region
detection problem that aims to continuously detect a bursty region of a given
size in a specified geographical area from a stream of spatial objects.
Specifically, a bursty region shows maximum spike in the number of spatial
objects in a given time window. The problem is useful in addressing several
real-world challenges such as surge pricing problem in online transportation
and disease outbreak detection. To solve the problem, we propose an exact
solution and two approximate solutions, and the approximation ratio is
in terms of the burst score, where is a parameter
to control the burst score. We further extend these solutions to support
detection of top- bursty regions. Extensive experiments with real-world data
are conducted to demonstrate the efficiency and effectiveness of our solutions
Analysing Temporal Relations – Beyond Windows, Frames and Predicates
This article proposes an approach to rely on the standard
operators of relational algebra (including grouping and ag-
gregation) for processing complex event without requiring
window specifications. In this way the approach can pro-
cess complex event queries of the kind encountered in appli-
cations such as emergency management in metro networks.
This article presents Temporal Stream Algebra (TSA) which
combines the operators of relational algebra with an analy-
sis of temporal relations at compile time. This analysis de-
termines which relational algebra queries can be evaluated
against data streams, i. e. the analysis is able to distinguish
valid from invalid stream queries. Furthermore the analysis
derives functions similar to the pass, propagation and keep
invariants in Tucker's et al. \Exploiting Punctuation Seman-
tics in Continuous Data Streams". These functions enable
the incremental evaluation of TSA queries, the propagation
of punctuations, and garbage collection. The evaluation of
TSA queries combines bulk-wise and out-of-order processing
which makes it tolerant to workload bursts as they typically
occur in emergency management. The approach has been
conceived for efficiently processing complex event queries on
top of a relational database system. It has been deployed
and tested on MonetDB
GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
- …