519 research outputs found

    Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window

    Get PDF
    The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is O(kϵlogϵNk)O(\frac{k} {\epsilon} \log \frac{\epsilon N}{k}) bits for basic counting and O(kϵlogNk)O(\frac{k}{\epsilon} \log \frac{N}{k}) words for the remainings, where kk is the number of distributed data streams, NN is the total number of items in the streams that arrive or expire in the window, and ϵ<1\epsilon < 1 is the desired error bound. Matching and nearly matching lower bounds are also obtained.Comment: 12 pages, to appear in the 27th International Symposium on Theoretical Aspects of Computer Science (STACS), 201

    Time-decaying Sketches for Robust Aggregation of Sensor Data

    Get PDF
    We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communication-efficient aggregation in distributed streaming scenarios, such as sensor networks: the sketch is duplicate insensitive, i.e., reinsertions of the same data will not affect the sketch and hence the estimates of aggregates. Unlike previous duplicate-insensitive sketches for sensor data aggregation [S. Nath et al., Synposis diffusion for robust aggregation in sensor networks, in Proceedings of the 2nd International Conference on Embedded Network Sensor Systems, (2004), pp. 250–262], [J. Considine et al., Approximate aggregation techniques for sensor databases, in Proceedings of the 20th International Conference on Data Engineering (ICDE), 2004, pp. 449–460], it is also time decaying, so that the weight of a data item in the sketch can decrease with time according to a user-specified decay function. The sketch can give provably approximate guarantees for various aggregates of data, including the sum, median, quantiles, and frequent elements. The size of the sketch and the time taken to update it are both polylogarithmic in the size of the relevant data. Further, multiple sketches computed over distributed data can be combined without loss of accuracy. To our knowledge, this is the first sketch that combines all the above properties

    An evaluation of streaming algorithms for distinct counting over a sliding window

    Get PDF
    Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring. On a stream of elements, it is commonly needed to compute an aggregate over only the most recent elements, leading to the problem of distinct counting over a “sliding window” of the stream. We present a detailed experimental study of the performance of different algorithms for distinct counting over a sliding window. We observe that the performance of an algorithm depends on the basic method used, as well as aspects such as the hash function, the mix of query and updates, and the method used to boost accuracy. We compare the performance of prominent algorithms and evaluate the influence of these factors, leading to practical recommendations for implementation. To the best of our knowledge, this is the first detailed experimental study of distinct counting over a sliding window

    Near Optimal Linear Algebra in the Online and Sliding Window Models

    Full text link
    We initiate the study of numerical linear algebra in the sliding window model, where only the most recent WW updates in a stream form the underlying data set. We first introduce a unified row-sampling based framework that gives randomized algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and 1\ell_1-subspace embeddings in the sliding window model, which often use nearly optimal space and achieve nearly input sparsity runtime. Our algorithms are based on "reverse online" versions of offline sampling distributions such as (ridge) leverage scores, 1\ell_1 sensitivities, and Lewis weights to quantify both the importance and the recency of a row. Our row-sampling framework rather surprisingly implies connections to the well-studied online model; our structural results also give the first sample optimal (up to lower order terms) online algorithm for low-rank approximation/projection-cost preservation. Using this powerful primitive, we give online algorithms for column/row subset selection and principal component analysis that resolves the main open question of Bhaskara et. al.,(FOCS 2019). We also give the first online algorithm for 1\ell_1-subspace embeddings. We further formalize the connection between the online model and the sliding window model by introducing an additional unified framework for deterministic algorithms using a merge and reduce paradigm and the concept of online coresets. Our sampling based algorithms in the row-arrival online model yield online coresets, giving deterministic algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and 1\ell_1-subspace embeddings in the sliding window model that use nearly optimal space

    Approximate order-k Voronoi cells over positional streams

    Full text link
    Handling streams of positional updates from numerous moving objects has become a challenging task for many monitoring applications. Several algorithms have been recently proposed for providing exact answers particularly to continuous range and k-nearest neighbor queries against current object positions. In this work, we introduce a processing technique for efficiently maintaining an approximate order-k Voronoi cell around a certain point of interest when all objects continuously change their locations. This heuristic can easily provide a fairly reliable estimate of the k-nearest neighbors for any query point found inside the constructed cell. We further extend our method to handle positional updates that are not received concurrently for all objects, but instead remain valid for a specific time interval according to a sliding window model. Extensive experimental analysis over synthetic datasets confirms the robustness and scalability of this approach offering near real-time cell maintenance with acceptable error margins
    corecore