Search CORE

3,665 research outputs found

Improved Algorithms for Time Decay Streams

Author: Braverman Vladimir
Lang Harry
Ullah Enayat
Zhou Samson
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Efficient Summing over Sliding Windows

Author: Basat Ran Ben
Einziger Gil
Friedman Roy
Kassner Yaron
Publication venue
Publication date: 01/01/2016
Field of study

This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of {\Omega}(1/{\epsilon} + log W) memory bits for W{\epsilon}-additive approximations is derived. This is followed by an algorithm whose memory consumption is O(1/{\epsilon} + log W) bits, indicating that the algorithm is optimal and that the bound is tight. Next, the more general problem of maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is addressed. The paper shows that approximating the sum within an additive error of RW{\epsilon} can also be done using {\Theta}(1/{\epsilon} + log W) bits for {\epsilon}={\Omega}(1/W). For {\epsilon}=o(1/W), we present a succinct algorithm which uses B(1 + o(1)) bits, where B={\Theta}(Wlog(1/W{\epsilon})) is the derived lower bound. We show that all lower bounds generalize to randomized algorithms as well. All algorithms process new elements and answer queries in O(1) worst-case time.Comment: A shorter version appears in SWAT 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Time-decaying Sketches for Robust Aggregation of Sensor Data

Author: Cormode Graham
Tirthapura Srikanta
Xu Bojian
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communication-efficient aggregation in distributed streaming scenarios, such as sensor networks: the sketch is duplicate insensitive, i.e., reinsertions of the same data will not affect the sketch and hence the estimates of aggregates. Unlike previous duplicate-insensitive sketches for sensor data aggregation [S. Nath et al., Synposis diffusion for robust aggregation in sensor networks, in Proceedings of the 2nd International Conference on Embedded Network Sensor Systems, (2004), pp. 250–262], [J. Considine et al., Approximate aggregation techniques for sensor databases, in Proceedings of the 20th International Conference on Data Engineering (ICDE), 2004, pp. 449–460], it is also time decaying, so that the weight of a data item in the sketch can decrease with time according to a user-specified decay function. The sketch can give provably approximate guarantees for various aggregates of data, including the sum, median, quantiles, and frequent elements. The size of the sketch and the time taken to update it are both polylogarithmic in the size of the relevant data. Further, multiple sketches computed over distributed data can be combined without loss of accuracy. To our knowledge, this is the first sketch that combines all the above properties

Digital Repository @ Iowa State University (ISU)

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window

Author: Chan Ho-Leung
Lam Tak-Wah
Lee Lap-Kei
Ting Hing-Fung
Publication venue
Publication date: 01/01/2010
Field of study

The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is

O(\frac{k} {\epsilon} \log \frac{\epsilon N}{k})

bits for basic counting and

O(\frac{k}{\epsilon} \log \frac{N}{k})

words for the remainings, where

k

is the number of distributed data streams,

N

is the total number of items in the streams that arrive or expire in the window, and

\epsilon < 1

is the desired error bound. Matching and nearly matching lower bounds are also obtained.Comment: 12 pages, to appear in the 27th International Symposium on Theoretical Aspects of Computer Science (STACS), 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

HKU Scholars Hub

A recent-biased dimension reduction technique for time series data

Author: Zhang C
Zhang S
Zhao Y
Publication venue
Publication date: 01/01/2005
Field of study

There are many techniques developed for tackling time series and most of them consider every part of a sequence equally. In many applications, however, recent data can often be much more interesting and significant than old data. This paper defines new recent-biased measures for distance and energy, and proposes a recent-biased technique based on DWT for time series in which more recent data are considered more significant. With such a recent-biased technique, the dimension of time series can be reduced while effectively preserving the recent-biased energy. Our experiments have demonstrated the effectiveness of the proposed approach for handling time series. © Springer-Verlag Berlin Heidelberg 2005

CiteSeerX

Crossref

OPUS - University of Technology Sydney