Search CORE

3,999 research outputs found

Almost-Smooth Histograms and Sliding-Window Graph Algorithms

Author: Krauthgamer Robert
Reitblat David
Publication venue
Publication date: 20/07/2020
Field of study

We study algorithms for the sliding-window model, an important variant of the data-stream model, in which the goal is to compute some function of a fixed-length suffix of the stream. We extend the smooth-histogram framework of Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes all subadditive functions. Specifically, we show that if a subadditive function can be

(1+\epsilon)

-approximated in the insertion-only streaming model, then it can be

(2+\epsilon)

-approximated also in the sliding-window model with space complexity larger by factor

O(\epsilon^{-1}\log w)

, where

w

is the window size. We demonstrate how our framework yields new approximation algorithms with relatively little effort for a variety of problems that do not admit the smooth-histogram technique. For example, in the frequency-vector model, a symmetric norm is subadditive and thus we obtain a sliding-window

(2+\epsilon)

-approximation algorithm for it. Another example is for streaming matrices, where we derive a new sliding-window

(\sqrt{2}+\epsilon)

-approximation algorithm for Schatten

4

-norm. We then consider graph streams and show that many graph problems are subadditive, including maximum submodular matching, minimum vertex-cover, and maximum

k

-cover, thereby deriving sliding-window

O(1)

-approximation algorithms for them almost for free (using known insertion-only algorithms). Finally, we design for every

d\in (1,2]

an artificial function, based on the maximum-matching size, whose almost-smoothness parameter is exactly

d

arXiv.org e-Print Archive

New Algorithms for Distributed Sliding Windows

Author: Gayen Sutanu
Vinodchandran N. V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018)
Publication date: 01/01/2018
Field of study

Computing functions over a distributed stream of data is a significant problem with practical applications. The distributed streaming model is a natural computational model to deal with such scenarios. The goal in this model is to maintain an approximate value of a function of interest over a data stream distributed across several computational nodes. These computational nodes have a two-way communication channel with a coordinator node that maintains an approximation of the function over the entire data stream seen so far. The resources of interest, which need to be minimized, are communication (primary), space, and update time. A practical variant of this model is that of distributed sliding window (dsw), where the computation is limited to the last W items, where W is the window size. Important problems such as sampling and counting have been investigated in this model. However, certain problems including computing frequency moments and metric clustering, that are well studied in other streaming models, have not been considered in the distributed sliding window model. We give the first algorithms for computing the frequency moments and metric clustering problems in the distributed sliding window model. Our algorithms for these problems are a result of a general transfer theorem we establish that transforms any algorithm in the distributed infinite window model to an algorithm in the distributed sliding window model, for a large class of functions. In particular, we show an efficient adaptation of the smooth histogram technique of Braverman and Ostrovsky, to the distributed streaming model. Our construction allows trade-offs between communication and space. If we optimize for communication, we get algorithms that are as communication efficient as their infinite window counter parts (upto polylogarithmic factors)

Dagstuhl Research Online Publication Server

Improved Algorithms for Time Decay Streams

Author: Braverman Vladimir
Lang Harry
Ullah Enayat
Zhou Samson
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Tight results for clustering and summarizing data streams

Author: Guha Sudipto
Publication venue: ScholarlyCommons
Publication date: 01/01/2009
Field of study

In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of ``streamstrapping\u27\u27 where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems

Crossref

ScholarlyCommons@Penn

Private Decayed Sum Estimation under Continual Observation

Author: Bolot Jean
Fawaz Nadia
Muthukrishnan S.
Nikolov Aleksandar
Taft Nina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/03/2012
Field of study

In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al. We present algorithms for decayed sum which are \eps-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive 1/\eps and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error

arXiv.org e-Print Archive

Crossref