63,027 research outputs found
Estimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network
traffic measurement, anomaly detection, neural computations, spike trains, etc.
This study focuses on estimating Shannon entropy of data streams. It is known
that Shannon entropy can be approximated by Reenyi entropy or Tsallis entropy,
which are both functions of the p-th frequency moments and approach Shannon
entropy as p->1.
Compressed Counting (CC) is a new method for approximating the p-th frequency
moments of data streams. Our contributions include:
1) We prove that Renyi entropy is (much) better than Tsallis entropy for
approximating Shannon entropy.
2) We propose the optimal quantile estimator for CC, which considerably
improves the previous estimators.
3) Our experiments demonstrate that CC is indeed highly effective
approximating the moments and entropies. We also demonstrate the crucial
importance of utilizing the variance-bias trade-off
Dynamic Data Structures for Document Collections and Graphs
In the dynamic indexing problem, we must maintain a changing collection of
text documents so that we can efficiently support insertions, deletions, and
pattern matching queries. We are especially interested in developing efficient
data structures that store and query the documents in compressed form. All
previous compressed solutions to this problem rely on answering rank and select
queries on a dynamic sequence of symbols. Because of the lower bound in
[Fredman and Saks, 1989], answering rank queries presents a bottleneck in
compressed dynamic indexing. In this paper we show how this lower bound can be
circumvented using our new framework. We demonstrate that the gap between
static and dynamic variants of the indexing problem can be almost closed. Our
method is based on a novel framework for adding dynamism to static compressed
data structures. Our framework also applies more generally to dynamizing other
problems. We show, for example, how our framework can be applied to develop
compressed representations of dynamic graphs and binary relations
- …