22 research outputs found
Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Max-stable random sketches can be computed efficiently on fast streaming
positive data sets by using only sequential access to the data. They can be
used to answer point and Lp-norm queries for the signal. There is an intriguing
connection between the so-called p-stable (or sum-stable) and the max-stable
sketches. Rigorous performance guarantees through error-probability estimates
are derived and the algorithmic implementation is discussed
On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting
Estimating the p-th frequency moment of data stream is a very heavily studied
problem. The problem is actually trivial when p = 1, assuming the strict
Turnstile model. The sample complexity of our proposed algorithm is essentially
O(1) near p=1. This is a very large improvement over the previously believed
O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of
entropy estimation an easy task, as verified by the experiments included in the
appendix
Recognizing well-parenthesized expressions in the streaming model
Motivated by a concrete problem and with the goal of understanding the sense
in which the complexity of streaming algorithms is related to the complexity of
formal languages, we investigate the problem Dyck(s) of checking matching
parentheses, with different types of parenthesis.
We present a one-pass randomized streaming algorithm for Dyck(2) with space
\Order(\sqrt{n}\log n), time per letter \polylog (n), and one-sided error.
We prove that this one-pass algorithm is optimal, up to a \polylog n factor,
even when two-sided error is allowed. For the lower bound, we prove a direct
sum result on hard instances by following the "information cost" approach, but
with a few twists. Indeed, we play a subtle game between public and private
coins. This mixture between public and private coins results from a balancing
act between the direct sum result and a combinatorial lower bound for the base
case.
Surprisingly, the space requirement shrinks drastically if we have access to
the input stream in reverse. We present a two-pass randomized streaming
algorithm for Dyck(2) with space \Order((\log n)^2), time \polylog (n) and
one-sided error, where the second pass is in the reverse direction. Both
algorithms can be extended to Dyck(s) since this problem is reducible to
Dyck(2) for a suitable notion of reduction in the streaming model.Comment: 20 pages, 5 figure