41 research outputs found
Information Theoretical Estimators Toolbox
We present ITE (information theoretical estimators) a free and open source,
multi-platform, Matlab/Octave toolbox that is capable of estimating many
different variants of entropy, mutual information, divergence, association
measures, cross quantities, and kernels on distributions. Thanks to its highly
modular design, ITE supports additionally (i) the combinations of the
estimation techniques, (ii) the easy construction and embedding of novel
information theoretical estimators, and (iii) their immediate application in
information theoretical optimization problems. ITE also includes a prototype
application in a central problem class of signal processing, independent
subspace analysis and its extensions.Comment: 5 pages; ITE toolbox: https://bitbucket.org/szzoli/ite
Direct Ensemble Estimation of Density Functionals
Estimating density functionals of analog sources is an important problem in
statistical signal processing and information theory. Traditionally, estimating
these quantities requires either making parametric assumptions about the
underlying distributions or using non-parametric density estimation followed by
integration. In this paper we introduce a direct nonparametric approach which
bypasses the need for density estimation by using the error rates of k-NN
classifiers asdata-driven basis functions that can be combined to estimate a
range of density functionals. However, this method is subject to a non-trivial
bias that dramatically slows the rate of convergence in higher dimensions. To
overcome this limitation, we develop an ensemble method for estimating the
value of the basis function which, under some minor constraints on the
smoothness of the underlying distributions, achieves the parametric rate of
convergence regardless of data dimension.Comment: 5 page
Computing Entropies With Nested Sampling
The Shannon entropy, and related quantities such as mutual information, can
be used to quantify uncertainty and relevance. However, in practice, it can be
difficult to compute these quantities for arbitrary probability distributions,
particularly if the probability mass functions or densities cannot be
evaluated. This paper introduces a computational approach, based on Nested
Sampling, to evaluate entropies of probability distributions that can only be
sampled. I demonstrate the method on three examples: a simple gaussian example
where the key quantities are available analytically; (ii) an experimental
design example about scheduling observations in order to measure the period of
an oscillating signal; and (iii) predicting the future from the past in a
heavy-tailed scenario.Comment: Accepted for publication in Entropy. 21 pages, 3 figures. Software
available at https://github.com/eggplantbren/InfoNes
Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
We provide finite-sample analysis of a general framework for using k-nearest
neighbor statistics to estimate functionals of a nonparametric continuous
probability density, including entropies and divergences. Rather than plugging
a consistent density estimate (which requires as the sample size
) into the functional of interest, the estimators we consider fix
k and perform a bias correction. This is more efficient computationally, and,
as we show in certain cases, statistically, leading to faster convergence
rates. Our framework unifies several previous estimators, for most of which
ours are the first finite sample guarantees.Comment: 16 pages, 0 figure
HyperGI: Automated Detection and Repair of Information Flow Leakage
Maintaining confidential information control in soft-ware is a persistent security problem where failure means secrets can be revealed via program behaviors. Information flow control techniques traditionally have been based on static or symbolic analyses — limited in scalability and specialized to particular languages. When programs do leak secrets there are no approaches to automatically repair them unless the leak causes a functional test to fail. We present our vision for HyperGI, a genetic improvement framework that detects, localizes and repairs information leakage. Key elements of HyperGI include (1) the use of two orthogonal test suites, (2) a dynamic leak detection approach which estimates and localizes potential leaks, and (3) a repair component that produces a candidate patch using genetic improvement. We demonstrate the successful use of HyperGI on several programs with no failing functional test cases. We manually examine the resulting patches and identify trade-offs and future directions for fully realizing our vision