research

Fast Entropy Estimation for Natural Sequences

Abstract

It is well known that to estimate the Shannon entropy for symbolic sequences accurately requires a large number of samples. When some aspects of the data are known it is plausible to attempt to use this to more efficiently compute entropy. A number of methods having various assumptions have been proposed which can be used to calculate entropy for small sample sizes. In this paper, we examine this problem and propose a method for estimating the Shannon entropy for a set of ranked symbolic natural events. Using a modified Zipf-Mandelbrot-Li law and a new rank-based coincidence counting method, we propose an efficient algorithm which enables the entropy to be estimated with surprising accuracy using only a small number of samples. The algorithm is tested on some natural sequences and shown to yield accurate results with very small amounts of data

    Similar works