42,093 research outputs found

    Minimum Rates of Approximate Sufficient Statistics

    Full text link
    Given a sufficient statistic for a parametric family of distributions, one can estimate the parameter without access to the data. However, the memory or code size for storing the sufficient statistic may nonetheless still be prohibitive. Indeed, for nn independent samples drawn from a kk-nomial distribution with d=k1d=k-1 degrees of freedom, the length of the code scales as dlogn+O(1)d\log n+O(1). In many applications, we may not have a useful notion of sufficient statistics (e.g., when the parametric family is not an exponential family) and we also may not need to reconstruct the generating distribution exactly. By adopting a Shannon-theoretic approach in which we allow a small error in estimating the generating distribution, we construct various {\em approximate sufficient statistics} and show that the code length can be reduced to d2logn+O(1)\frac{d}{2}\log n+O(1). We consider errors measured according to the relative entropy and variational distance criteria. For the code constructions, we leverage Rissanen's minimum description length principle, which yields a non-vanishing error measured according to the relative entropy. For the converse parts, we use Clarke and Barron's formula for the relative entropy of a parametrized distribution and the corresponding mixture distribution. However, this method only yields a weak converse for the variational distance. We develop new techniques to achieve vanishing errors and we also prove strong converses. The latter means that even if the code is allowed to have a non-vanishing error, its length must still be at least d2logn\frac{d}{2}\log n.Comment: To appear in the IEEE Transactions on Information Theor

    A Short Introduction to Model Selection, Kolmogorov Complexity and Minimum Description Length (MDL)

    Full text link
    The concept of overfitting in model selection is explained and demonstrated with an example. After providing some background information on information theory and Kolmogorov complexity, we provide a short explanation of Minimum Description Length and error minimization. We conclude with a discussion of the typical features of overfitting in model selection.Comment: 20 pages, Chapter 1 of The Paradox of Overfitting, Master's thesis, Rijksuniversiteit Groningen, 200

    Finite-Block-Length Analysis in Classical and Quantum Information Theory

    Full text link
    Coding technology is used in several information processing tasks. In particular, when noise during transmission disturbs communications, coding technology is employed to protect the information. However, there are two types of coding technology: coding in classical information theory and coding in quantum information theory. Although the physical media used to transmit information ultimately obey quantum mechanics, we need to choose the type of coding depending on the kind of information device, classical or quantum, that is being used. In both branches of information theory, there are many elegant theoretical results under the ideal assumption that an infinitely large system is available. In a realistic situation, we need to account for finite size effects. The present paper reviews finite size effects in classical and quantum information theory with respect to various topics, including applied aspects

    The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups

    Full text link
    We present a three-dimensional reconstruction of the velocity distribution of nearby stars (<~ 100 pc) using a maximum likelihood density estimation technique applied to the two-dimensional tangential velocities of stars. The underlying distribution is modeled as a mixture of Gaussian components. The algorithm reconstructs the error-deconvolved distribution function, even when the individual stars have unique error and missing-data properties. We apply this technique to the tangential velocity measurements from a kinematically unbiased sample of 11,865 main sequence stars observed by the Hipparcos satellite. We explore various methods for validating the complexity of the resulting velocity distribution function, including criteria based on Bayesian model selection and how accurately our reconstruction predicts the radial velocities of a sample of stars from the Geneva-Copenhagen survey (GCS). Using this very conservative external validation test based on the GCS, we find that there is little evidence for structure in the distribution function beyond the moving groups established prior to the Hipparcos mission. This is in sharp contrast with internal tests performed here and in previous analyses, which point consistently to maximal structure in the velocity distribution. We quantify the information content of the radial velocity measurements and find that the mean amount of new information gained from a radial velocity measurement of a single star is significant. This argues for complementary radial velocity surveys to upcoming astrometric surveys

    D3 branes in a Melvin universe: a new realm for gravitational holography

    Full text link
    The decoupling limit of a certain configuration of D3 branes in a Melvin universe defines a sector of string theory known as Puff Field Theory (PFT) - a theory with non-local dynamics but without gravity. In this work, we present a systematic analysis of the non-local states of strongly coupled PFT using gravitational holography. And we are led to a remarkable new holographic dictionary. We show that the theory admits states that may be viewed as brane protrusions from the D3 brane worldvolume. The footprint of a protrusion has finite size - the scale of non-locality in the PFT - and corresponds to an operator insertion in the PFT. We compute correlators of these states, and we demonstrate that only part of the holographic bulk is explored by this computation. We then show that the remaining space holographically encodes the dynamics of the D3 brane tentacles. The two sectors are coupled: in this holographic description, this is realized via quantum entanglement across a holographic screen - a throat in the geometry - that splits the bulk into the two regions in question. We then propose a description of PFT through a direct product of two Fock spaces - akin to other non-local settings that employ quantum group structures.Comment: 44 pages, 13 figures; v2: minor corrections, citations added; v3: typos corrected in section on local operators, some asymptotic expansions improved and made more consistent with rest of paper in section on non-local operator

    Divergence rates of Markov order estimators and their application to statistical estimation of stationary ergodic processes

    Get PDF
    Stationary ergodic processes with finite alphabets are estimated by finite memory processes from a sample, an n-length realization of the process, where the memory depth of the estimator process is also estimated from the sample using penalized maximum likelihood (PML). Under some assumptions on the continuity rate and the assumption of non-nullness, a rate of convergence in dˉ\bar{d}-distance is obtained, with explicit constants. The result requires an analysis of the divergence of PML Markov order estimators for not necessarily finite memory processes. This divergence problem is investigated in more generality for three information criteria: the Bayesian information criterion with generalized penalty term yielding the PML, and the normalized maximum likelihood and the Krichevsky-Trofimov code lengths. Lower and upper bounds on the estimated order are obtained. The notion of consistent Markov order estimation is generalized for infinite memory processes using the concept of oracle order estimates, and generalized consistency of the PML Markov order estimator is presented.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ468 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
    corecore