research

On Estimating the First Frequency Moment of Data Streams

Abstract

Estimating the first moment of a data stream defined as F_1 = \sum_{i \in \{1, 2, \ldots, n\}} \abs{f_i} to within 1±ϵ1 \pm \epsilon-relative error with high probability is a basic and influential problem in data stream processing. A tight space bound of O(ϵ2log(mM))O(\epsilon^{-2} \log (mM)) is known from the work of [Kane-Nelson-Woodruff-SODA10]. However, all known algorithms for this problem require per-update stream processing time of Ω(ϵ2)\Omega(\epsilon^{-2}), with the only exception being the algorithm of [Ganguly-Cormode-RANDOM07] that requires per-update processing time of O(log2(mM)(logn))O(\log^2(mM)(\log n)) albeit with sub-optimal space O(ϵ3log2(mM))O(\epsilon^{-3}\log^2(mM)). In this paper, we present an algorithm for estimating F1F_1 that achieves near-optimality in both space and update processing time. The space requirement is O(ϵ2(logn+(logϵ1)log(mM)))O(\epsilon^{-2}(\log n + (\log \epsilon^{-1})\log(mM))) and the per-update processing time is O((logn)log(ϵ1))O( (\log n)\log (\epsilon^{-1})).Comment: 12 page

    Similar works

    Full text

    thumbnail-image

    Available Versions