6 research outputs found
A Detailed Analysis of the SpaceSaving Family of Algorithms with Bounded Deletions
In this paper, we present an advanced analysis of near optimal deterministic
algorithms using a small space budget to solve the frequency estimation, heavy
hitters, frequent items, and top-k approximation in the bounded deletion model.
We define the family of SpaceSaving algorithms and explain why the
original SpaceSaving algorithm only works when insertions and deletions
are not interleaved. Next, we introduce the new DoubleSpaceSaving and the
IntegratedSpaceSaving and prove their correctness. They show similar
characteristics and both extend the popular space-efficient SpaceSaving
algorithm. However, these two algorithms represent different trade-offs, in
which DoubleSpaceSaving distributes the operations to two independent
summaries while Integrated-SpaceSaving fully synchronizes deletions with
insertions. Since data streams are often skewed, we present an improved
analysis of these two algorithms and show that errors do not depend on the hot
items and are only dependent on the cold and warm items. We also demonstrate
how to achieve the relative error guarantee under mild assumptions. Moreover,
we establish that the important mergeability property exists on these two
algorithms which is desirable in distributed settings
A Framework for Adversarially Robust Streaming Algorithms
We investigate the adversarial robustness of streaming algorithms. In this
context, an algorithm is considered robust if its performance guarantees hold
even if the stream is chosen adaptively by an adversary that observes the
outputs of the algorithm along the stream and can react in an online manner.
While deterministic streaming algorithms are inherently robust, many central
problems in the streaming literature do not admit sublinear-space deterministic
algorithms; on the other hand, classical space-efficient randomized algorithms
for these problems are generally not adversarially robust. This raises the
natural question of whether there exist efficient adversarially robust
(randomized) streaming algorithms for these problems.
In this work, we show that the answer is positive for various important
streaming problems in the insertion-only model, including distinct elements and
more generally -estimation, -heavy hitters, entropy estimation, and
others. For all of these problems, we develop adversarially robust
-approximation algorithms whose required space matches that of
the best known non-robust algorithms up to a multiplicative factor (and in some cases even up to a constant
factor). Towards this end, we develop several generic tools allowing one to
efficiently transform a non-robust streaming algorithm into a robust one in
various scenarios.Comment: Conference version in PODS 2020. Version 3 addressing journal
referees' comments; improved exposition of sketch switchin
The White-Box Adversarial Data Stream Model
We study streaming algorithms in the white-box adversarial model, where the
stream is chosen adaptively by an adversary who observes the entire internal
state of the algorithm at each time step. We show that nontrivial algorithms
are still possible. We first give a randomized algorithm for the -heavy
hitters problem that outperforms the optimal deterministic Misra-Gries
algorithm on long streams. If the white-box adversary is computationally
bounded, we use cryptographic techniques to reduce the memory of our
-heavy hitters algorithm even further and to design a number of additional
algorithms for graph, string, and linear algebra problems. The existence of
such algorithms is surprising, as the streaming algorithm does not even have a
secret key in this model, i.e., its state is entirely known to the adversary.
One algorithm we design is for estimating the number of distinct elements in a
stream with insertions and deletions achieving a multiplicative approximation
and sublinear space; such an algorithm is impossible for deterministic
algorithms.
We also give a general technique that translates any two-player deterministic
communication lower bound to a lower bound for {\it randomized} algorithms
robust to a white-box adversary. In particular, our results show that for all
, there exists a constant such that any -approximation
algorithm for moment estimation in insertion-only streams with a
white-box adversary requires space for a universe of size .
Similarly, there is a constant such that any -approximation algorithm
in an insertion-only stream for matrix rank requires space with a
white-box adversary. Our algorithmic results based on cryptography thus show a
separation between computationally bounded and unbounded adversaries.
(Abstract shortened to meet arXiv limits.)Comment: PODS 202