Search CORE

644 research outputs found

Estimating Entropy of Data Streams Using Compressed Counting

Author: Li Ping
Publication venue
Publication date: 01/01/2009
Field of study

The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Reenyi entropy or Tsallis entropy, which are both functions of the p-th frequency moments and approach Shannon entropy as p->1. Compressed Counting (CC) is a new method for approximating the p-th frequency moments of data streams. Our contributions include: 1) We prove that Renyi entropy is (much) better than Tsallis entropy for approximating Shannon entropy. 2) We propose the optimal quantile estimator for CC, which considerably improves the previous estimators. 3) Our experiments demonstrate that CC is indeed highly effective approximating the moments and entropies. We also demonstrate the crucial importance of utilizing the variance-bias trade-off

arXiv.org e-Print Archive

CiteSeerX

Importance sampling for Lambda-coalescents in the infinitely many sites model

Author: Birkner
Birkner
Carr
Dong
Eldon
Ethier
Felsenstein
Griffiths
Griffiths
Griffiths
Griffiths
Hobolth
Hobolth
Jochen Blath
Matthias Birkner
Matthias Steinrücken
Möhle
Pepin
Pitman
Rogers
Sagitov
Schweinsberg
Sigurgíslason
Stephens
Tavaré
Ward
Árnason
Árnason
Árnason
Árnason
Publication venue: 'Elsevier BV'
Publication date: 09/05/2011
Field of study

We present and discuss new importance sampling schemes for the approximate computation of the sample probability of observed genetic types in the infinitely many sites model from population genetics. More specifically, we extend the 'classical framework', where genealogies are assumed to be governed by Kingman's coalescent, to the more general class of Lambda-coalescents and develop further Hobolth et. al.'s (2008) idea of deriving importance sampling schemes based on 'compressed genetrees'. The resulting schemes extend earlier work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures

arXiv.org e-Print Archive

Crossref

Regularization-free estimation in trace regression with symmetric positive semidefinite matrices

Author: Hein Matthias
Li Ping
Slawski Martin
Publication venue
Publication date: 01/01/2015
Field of study

Over the past few years, trace regression models have received considerable attention in the context of matrix completion, quantum state tomography, and compressed sensing. Estimation of the underlying matrix from regularization-based approaches promoting low-rankedness, notably nuclear norm regularization, have enjoyed great popularity. In the present paper, we argue that such regularization may no longer be necessary if the underlying matrix is symmetric positive semidefinite (\textsf{spd}) and the design satisfies certain conditions. In this situation, simple least squares estimation subject to an \textsf{spd} constraint may perform as well as regularization-based approaches with a proper choice of the regularization parameter, which entails knowledge of the noise level and/or tuning. By contrast, constrained least squares estimation comes without any tuning parameter and may hence be preferred due to its simplicity

arXiv.org e-Print Archive

CiteSeerX

CISPA – Helmholtz-Zentrum für Informationssicherheit

Mean Estimation from One-Bit Measurements

Author: Duchi John C.
Kipnis Alon
Publication venue
Publication date: 06/01/2021
Field of study

We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release

n

bits given a sample of size

n

, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that \emph{one} round of adaptivity is sufficient to achieve optimal mean-square error

arXiv.org e-Print Archive

Info-Greedy sequential adaptive compressed sensing

Author: Braun Gabor
Pokutta Sebastian
Xie Yao
Publication venue
Publication date: 02/02/2015
Field of study

We present an information-theoretic framework for sequential adaptive compressed sensing, Info-Greedy Sensing, where measurements are chosen to maximize the extracted information conditioned on the previous measurements. We show that the widely used bisection approach is Info-Greedy for a family of

k

-sparse signals by connecting compressed sensing and blackbox complexity of sequential query algorithms, and present Info-Greedy algorithms for Gaussian and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse Info-Greedy measurements. Numerical examples demonstrate the good performance of the proposed algorithms using simulated and real data: Info-Greedy Sensing shows significant improvement over random projection for signals with sparse and low-rank covariance matrices, and adaptivity brings robustness when there is a mismatch between the assumed and the true distributions.Comment: Preliminary results presented at Allerton Conference 2014. To appear in IEEE Journal Selected Topics on Signal Processin

arXiv.org e-Print Archive

CiteSeerX