3,429 research outputs found
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window
The past decade has witnessed many interesting algorithms for maintaining
statistics over a data stream. This paper initiates a theoretical study of
algorithms for monitoring distributed data streams over a time-based sliding
window (which contains a variable number of items and possibly out-of-order
items). The concern is how to minimize the communication between individual
streams and the root, while allowing the root, at any time, to be able to
report the global statistics of all streams within a given error bound. This
paper presents communication-efficient algorithms for three classical
statistics, namely, basic counting, frequent items and quantiles. The
worst-case communication cost over a window is bits for basic counting and words for the remainings, where is the number of distributed
data streams, is the total number of items in the streams that arrive or
expire in the window, and is the desired error bound. Matching
and nearly matching lower bounds are also obtained.Comment: 12 pages, to appear in the 27th International Symposium on
Theoretical Aspects of Computer Science (STACS), 201
Continuous Nearest Neighbor Queries over Sliding Windows
Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (count-based) or 2) the arrivals within a fixed interval W covering the most recent time stamps (time-based). The task of the query processor is to constantly maintain the result of long-running NN queries among the valid data. We present two processing techniques that apply to both count-based and time-based windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distance-time space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skyline-based algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Location-dependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows.
- …