5,127 research outputs found

    Estimating Mutual Information

    Get PDF
    We present two classes of improved estimators for mutual information M(X,Y)M(X,Y), from samples of random points distributed according to some joint probability density ÎĽ(x,y)\mu(x,y). In contrast to conventional estimators based on binnings, they are based on entropy estimates from kk-nearest neighbour distances. This means that they are data efficient (with k=1k=1 we resolve structures down to the smallest possible scales), adaptive (the resolution is higher where data are more numerous), and have minimal bias. Indeed, the bias of the underlying entropy estimates is mainly due to non-uniformity of the density at the smallest resolved scale, giving typically systematic errors which scale as functions of k/Nk/N for NN points. Numerically, we find that both families become {\it exact} for independent distributions, i.e. the estimator M^(X,Y)\hat M(X,Y) vanishes (up to statistical fluctuations) if ÎĽ(x,y)=ÎĽ(x)ÎĽ(y)\mu(x,y) = \mu(x) \mu(y). This holds for all tested marginal distributions and for all dimensions of xx and yy. In addition, we give estimators for redundancies between more than 2 random variables. We compare our algorithms in detail with existing algorithms. Finally, we demonstrate the usefulness of our estimators for assessing the actual independence of components obtained from independent component analysis (ICA), for improving ICA, and for estimating the reliability of blind source separation.Comment: 16 pages, including 18 figure

    A vector quantization approach to universal noiseless coding and quantization

    Get PDF
    A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions

    ADE string vacua with discrete torsion

    Full text link
    We complete the classification of (2,2) string vacua that can be constructed by diagonal twists of tensor products of minimal models with ADE invariants. Using the \LG\ framework, we compute all spectra from inequivalent models of this type. The completeness of our results is only possible by systematically avoiding the huge redundancies coming from permutation symmetries of tensor products. We recover the results for (2,2) vacua of an extensive computation of simple current invariants by Schellekens and Yankielowitz, and find 4 additional mirror pairs of spectra that were missed by their stochastic method. For the model (1)9(1)^9 we observe a relation between redundant spectra and groups that are related in a particular way.Comment: 13 pages (LaTeX), preprint CERN-TH.6931/93 and ITP-UH-20/93 (reference added

    Towards Bulk Metric Reconstruction from Extremal Area Variations

    Get PDF
    The Ryu-Takayanagi and Hubeny-Rangamani-Takayanagi formulae suggest that bulk geometry emerges from the entanglement structure of the boundary theory. Using these formulae, we build on a result of Alexakis, Balehowsky, and Nachman to show that in four bulk dimensions, the entanglement entropies of boundary regions of disk topology uniquely fix the bulk metric in any region foliated by the corresponding HRT surfaces. More generally, for a bulk of any dimension d≥4d \geq 4, knowledge of the (variations of the) areas of two-dimensional boundary-anchored extremal surfaces of disk topology uniquely fixes the bulk metric wherever these surfaces reach. This result is covariant and not reliant on any symmetry assumptions; its applicability thus includes regions of strong dynamical gravity such as the early-time interior of black holes formed from collapse. While we only show uniqueness of the metric, the approach we present provides a clear path towards an explicit spacetime metric reconstruction.Comment: 33+4 pages, 7 figures; v2: addressed referee comment

    Guessing under source uncertainty

    Full text link
    This paper considers the problem of guessing the realization of a finite alphabet source when some side information is provided. The only knowledge the guesser has about the source and the correlated side information is that the joint source is one among a family. A notion of redundancy is first defined and a new divergence quantity that measures this redundancy is identified. This divergence quantity shares the Pythagorean property with the Kullback-Leibler divergence. Good guessing strategies that minimize the supremum redundancy (over the family) are then identified. The min-sup value measures the richness of the uncertainty set. The min-sup redundancies for two examples - the families of discrete memoryless sources and finite-state arbitrarily varying sources - are then determined.Comment: 27 pages, submitted to IEEE Transactions on Information Theory, March 2006, revised September 2006, contains minor modifications and restructuring based on reviewers' comment

    A Seismic Inversion Problem for an Anisotropic, Inhomogeneous Medium

    Get PDF
    In this report, we consider the propagation of seismic waves through a medium that can be subdivided into of two distinct parts. The upper part is assumed to be azimuthally symmetric, linearly nonuniform with increasing depth, and the velocity dependence with direction consistent with elliptical anisotropy. The lower part, which is the layer of interest, is assumed to also be azimuthally symmetric, but uniform and nonelliptically anisotropic. Despite nonellipticity, we assume the angular dependence of the velocity can be described by a convex curve. Our goal is to produce a single source-single receiver model which uses modern seismic measurements to determine the elastic moduli of the lower media. Once known, geoscientists could better describe the angular dependence of the velocity in the layer of interest and also would have some clues at to the actual material composing it
    • …
    corecore