644 research outputs found
Estimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network
traffic measurement, anomaly detection, neural computations, spike trains, etc.
This study focuses on estimating Shannon entropy of data streams. It is known
that Shannon entropy can be approximated by Reenyi entropy or Tsallis entropy,
which are both functions of the p-th frequency moments and approach Shannon
entropy as p->1.
Compressed Counting (CC) is a new method for approximating the p-th frequency
moments of data streams. Our contributions include:
1) We prove that Renyi entropy is (much) better than Tsallis entropy for
approximating Shannon entropy.
2) We propose the optimal quantile estimator for CC, which considerably
improves the previous estimators.
3) Our experiments demonstrate that CC is indeed highly effective
approximating the moments and entropies. We also demonstrate the crucial
importance of utilizing the variance-bias trade-off
Importance sampling for Lambda-coalescents in the infinitely many sites model
We present and discuss new importance sampling schemes for the approximate
computation of the sample probability of observed genetic types in the
infinitely many sites model from population genetics. More specifically, we
extend the 'classical framework', where genealogies are assumed to be governed
by Kingman's coalescent, to the more general class of Lambda-coalescents and
develop further Hobolth et. al.'s (2008) idea of deriving importance sampling
schemes based on 'compressed genetrees'. The resulting schemes extend earlier
work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner
and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance
comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures
Regularization-free estimation in trace regression with symmetric positive semidefinite matrices
Over the past few years, trace regression models have received considerable
attention in the context of matrix completion, quantum state tomography, and
compressed sensing. Estimation of the underlying matrix from
regularization-based approaches promoting low-rankedness, notably nuclear norm
regularization, have enjoyed great popularity. In the present paper, we argue
that such regularization may no longer be necessary if the underlying matrix is
symmetric positive semidefinite (\textsf{spd}) and the design satisfies certain
conditions. In this situation, simple least squares estimation subject to an
\textsf{spd} constraint may perform as well as regularization-based approaches
with a proper choice of the regularization parameter, which entails knowledge
of the noise level and/or tuning. By contrast, constrained least squares
estimation comes without any tuning parameter and may hence be preferred due to
its simplicity
Mean Estimation from One-Bit Measurements
We consider the problem of estimating the mean of a symmetric log-concave
distribution under the constraint that only a single bit per sample from this
distribution is available to the estimator. We study the mean squared error as
a function of the sample size (and hence the number of bits). We consider three
settings: first, a centralized setting, where an encoder may release bits
given a sample of size , and for which there is no asymptotic penalty for
quantization; second, an adaptive setting in which each bit is a function of
the current observation and previously recorded bits, where we show that the
optimal relative efficiency compared to the sample mean is precisely the
efficiency of the median; lastly, we show that in a distributed setting where
each bit is only a function of a local sample, no estimator can achieve optimal
efficiency uniformly over the parameter space. We additionally complement our
results in the adaptive setting by showing that \emph{one} round of adaptivity
is sufficient to achieve optimal mean-square error
Info-Greedy sequential adaptive compressed sensing
We present an information-theoretic framework for sequential adaptive
compressed sensing, Info-Greedy Sensing, where measurements are chosen to
maximize the extracted information conditioned on the previous measurements. We
show that the widely used bisection approach is Info-Greedy for a family of
-sparse signals by connecting compressed sensing and blackbox complexity of
sequential query algorithms, and present Info-Greedy algorithms for Gaussian
and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse
Info-Greedy measurements. Numerical examples demonstrate the good performance
of the proposed algorithms using simulated and real data: Info-Greedy Sensing
shows significant improvement over random projection for signals with sparse
and low-rank covariance matrices, and adaptivity brings robustness when there
is a mismatch between the assumed and the true distributions.Comment: Preliminary results presented at Allerton Conference 2014. To appear
in IEEE Journal Selected Topics on Signal Processin
- …