1 research outputs found
Boosting the Basic Counting on Distributed Streams
We revisit the classic basic counting problem in the distributed streaming
model that was studied by Gibbons and Tirthapura (GT). In the solution for
maintaining an -estimate, as what GT's method does, we make
the following new contributions: (1) For a bit stream of size , where each
bit has a probability at least to be 1, we exponentially reduced the
average total processing time from GT's to
, thus providing the first
sublinear-time streaming algorithm for this problem. (2) In addition to an
overall much faster processing speed, our method provides a new tradeoff that a
lower accuracy demand (a larger value for ) promises a faster
processing speed, whereas GT's processing speed is
in any case and for any . (3) The worst-case total time cost of our
method matches GT's , which is necessary but rarely
occurs in our method. (4) The space usage overhead in our method is a lower
order term compared with GT's space usage and occurs only times
during the stream processing and is too negligible to be detected by the
operating system in practice. We further validate these solid theoretical
results with experiments on both real-world and synthetic data, showing that
our method is faster than GT's by a factor of several to several thousands
depending on the stream size and accuracy demands, without any detectable space
usage overhead. Our method is based on a faster sampling technique that we
design for boosting GT's method and we believe this technique can be of other
interest.Comment: 32 page