36 research outputs found
On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting
Estimating the p-th frequency moment of data stream is a very heavily studied
problem. The problem is actually trivial when p = 1, assuming the strict
Turnstile model. The sample complexity of our proposed algorithm is essentially
O(1) near p=1. This is a very large improvement over the previously believed
O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of
entropy estimation an easy task, as verified by the experiments included in the
appendix
Career: fundamental lower bound and tradeoff problems in networking
Issued as final reportNational Science Foundation (U.S.
Estimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network
traffic measurement, anomaly detection, neural computations, spike trains, etc.
This study focuses on estimating Shannon entropy of data streams. It is known
that Shannon entropy can be approximated by Reenyi entropy or Tsallis entropy,
which are both functions of the p-th frequency moments and approach Shannon
entropy as p->1.
Compressed Counting (CC) is a new method for approximating the p-th frequency
moments of data streams. Our contributions include:
1) We prove that Renyi entropy is (much) better than Tsallis entropy for
approximating Shannon entropy.
2) We propose the optimal quantile estimator for CC, which considerably
improves the previous estimators.
3) Our experiments demonstrate that CC is indeed highly effective
approximating the moments and entropies. We also demonstrate the crucial
importance of utilizing the variance-bias trade-off
Estimating Renyi Entropy of Discrete Distributions
It was recently shown that estimating the Shannon entropy of a
discrete -symbol distribution requires samples,
a number that grows near-linearly in the support size. In many applications
can be replaced by the more general R\'enyi entropy of order
, . We determine the number of samples needed to
estimate for all , showing that
requires a super-linear, roughly samples, noninteger
requires a near-linear samples, but, perhaps surprisingly, integer
requires only samples. Furthermore,
developing on a recently established connection between polynomial
approximation and estimation of additive functions of the form , we reduce the sample complexity for noninteger values of by a
factor of compared to the empirical estimator. The estimators
achieving these bounds are simple and run in time linear in the number of
samples. Our lower bounds provide explicit constructions of distributions with
different R\'enyi entropies that are hard to distinguish
Testing Exponentiality Based on R\'enyi Entropy With Progressively Type-II Censored Data
We express the joint R\'enyi entropy of progressively censored order
statistics in terms of an incomplete integral of the hazard function, and
provide a simple estimate of the joint R\'enyi entropy of progressively Type-II
censored data. Then we establish a goodness of fit test statistic based on the
R\'enyi Kullback-Leibler information with the progressively Type-II censored
data, and compare its performance with the leading test statistic. A Monte
Carlo simulation study shows that the proposed test statistic shows better
powers than the leading test statistic against the alternatives with monotone
increasing, monotone decreasing and nonmonotone hazard functions.Comment: 16 page