12,909 research outputs found

    On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting

    Full text link
    Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near p=1. This is a very large improvement over the previously believed O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of entropy estimation an easy task, as verified by the experiments included in the appendix

    On Estimating the First Frequency Moment of Data Streams

    Full text link
    Estimating the first moment of a data stream defined as F_1 = \sum_{i \in \{1, 2, \ldots, n\}} \abs{f_i} to within 1±ϵ1 \pm \epsilon-relative error with high probability is a basic and influential problem in data stream processing. A tight space bound of O(ϵ2log(mM))O(\epsilon^{-2} \log (mM)) is known from the work of [Kane-Nelson-Woodruff-SODA10]. However, all known algorithms for this problem require per-update stream processing time of Ω(ϵ2)\Omega(\epsilon^{-2}), with the only exception being the algorithm of [Ganguly-Cormode-RANDOM07] that requires per-update processing time of O(log2(mM)(logn))O(\log^2(mM)(\log n)) albeit with sub-optimal space O(ϵ3log2(mM))O(\epsilon^{-3}\log^2(mM)). In this paper, we present an algorithm for estimating F1F_1 that achieves near-optimality in both space and update processing time. The space requirement is O(ϵ2(logn+(logϵ1)log(mM)))O(\epsilon^{-2}(\log n + (\log \epsilon^{-1})\log(mM))) and the per-update processing time is O((logn)log(ϵ1))O( (\log n)\log (\epsilon^{-1})).Comment: 12 page
    corecore