Search CORE

36 research outputs found

On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting

Author: Li Ping
Publication venue
Publication date: 01/01/2010
Field of study

Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near p=1. This is a very large improvement over the previously believed O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of entropy estimation an easy task, as verified by the experiments included in the appendix

arXiv.org e-Print Archive

CiteSeerX

Career: fundamental lower bound and tradeoff problems in networking

Author: Xu Jun
Publication venue: Georgia Institute of Technology
Publication date: 31/10/2009
Field of study

Issued as final reportNational Science Foundation (U.S.

Scholarly Materials And Research @ Georgia Tech

Estimating Entropy of Data Streams Using Compressed Counting

Author: Li Ping
Publication venue
Publication date: 01/01/2009
Field of study

The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Reenyi entropy or Tsallis entropy, which are both functions of the p-th frequency moments and approach Shannon entropy as p->1. Compressed Counting (CC) is a new method for approximating the p-th frequency moments of data streams. Our contributions include: 1) We prove that Renyi entropy is (much) better than Tsallis entropy for approximating Shannon entropy. 2) We propose the optimal quantile estimator for CC, which considerably improves the previous estimators. 3) Our experiments demonstrate that CC is indeed highly effective approximating the moments and entropies. We also demonstrate the crucial importance of utilizing the variance-bias trade-off

arXiv.org e-Print Archive

CiteSeerX

Estimating Renyi Entropy of Discrete Distributions

Author: Acharya Jayadev
Orlitsky Alon
Suresh Ananda Theertha
Tyagi Himanshu
Publication venue
Publication date: 10/03/2016
Field of study

It was recently shown that estimating the Shannon entropy

H({\rm p})

of a discrete

k

-symbol distribution

{\rm p}

requires

\Theta(k/\log k)

samples, a number that grows near-linearly in the support size. In many applications

H({\rm p})

can be replaced by the more general R\'enyi entropy of order

\alpha

H_\alpha({\rm p})

. We determine the number of samples needed to estimate

H_\alpha({\rm p})

for all

\alpha

, showing that

\alpha < 1

requires a super-linear, roughly

k^{1/\alpha}

samples, noninteger

\alpha>1

requires a near-linear

k

samples, but, perhaps surprisingly, integer

\alpha>1

requires only

\Theta(k^{1-1/\alpha})

samples. Furthermore, developing on a recently established connection between polynomial approximation and estimation of additive functions of the form

\sum_{x} f({\rm p}_x)

, we reduce the sample complexity for noninteger values of

\alpha

by a factor of

\log k

compared to the empirical estimator. The estimators achieving these bounds are simple and run in time linear in the number of samples. Our lower bounds provide explicit constructions of distributions with different R\'enyi entropies that are hard to distinguish

arXiv.org e-Print Archive

CiteSeerX

Testing Exponentiality Based on R\'enyi Entropy With Progressively Type-II Censored Data

Author: Kohansal Akram
Rezakhah Saeid
Publication venue
Publication date: 22/03/2013
Field of study

We express the joint R\'enyi entropy of progressively censored order statistics in terms of an incomplete integral of the hazard function, and provide a simple estimate of the joint R\'enyi entropy of progressively Type-II censored data. Then we establish a goodness of fit test statistic based on the R\'enyi Kullback-Leibler information with the progressively Type-II censored data, and compare its performance with the leading test statistic. A Monte Carlo simulation study shows that the proposed test statistic shows better powers than the leading test statistic against the alternatives with monotone increasing, monotone decreasing and nonmonotone hazard functions.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX