11,947 research outputs found

    Algorithmic Statistics

    Full text link
    While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on two-part codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the model-to-data code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes--in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the ``Kolmogorov structure function'' and ``absolutely non-stochastic objects''--those rare objects for which the simplest models that summarize their relevant information (minimal sufficient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones.Comment: LaTeX, 22 pages, 1 figure, with correction to the published journal versio

    Multiple Description Quantization via Gram-Schmidt Orthogonalization

    Full text link
    The multiple description (MD) problem has received considerable attention as a model of information transmission over unreliable channels. A general framework for designing efficient multiple description quantization schemes is proposed in this paper. We provide a systematic treatment of the El Gamal-Cover (EGC) achievable MD rate-distortion region, and show that any point in the EGC region can be achieved via a successive quantization scheme along with quantization splitting. For the quadratic Gaussian case, the proposed scheme has an intrinsic connection with the Gram-Schmidt orthogonalization, which implies that the whole Gaussian MD rate-distortion region is achievable with a sequential dithered lattice-based quantization scheme as the dimension of the (optimal) lattice quantizers becomes large. Moreover, this scheme is shown to be universal for all i.i.d. smooth sources with performance no worse than that for an i.i.d. Gaussian source with the same variance and asymptotically optimal at high resolution. A class of low-complexity MD scalar quantizers in the proposed general framework also is constructed and is illustrated geometrically; the performance is analyzed in the high resolution regime, which exhibits a noticeable improvement over the existing MD scalar quantization schemes.Comment: 48 pages; submitted to IEEE Transactions on Information Theor

    Side-information Scalable Source Coding

    Full text link
    The problem of side-information scalable (SI-scalable) source coding is considered in this work, where the encoder constructs a progressive description, such that the receiver with high quality side information will be able to truncate the bitstream and reconstruct in the rate distortion sense, while the receiver with low quality side information will have to receive further data in order to decode. We provide inner and outer bounds for general discrete memoryless sources. The achievable region is shown to be tight for the case that either of the decoders requires a lossless reconstruction, as well as the case with degraded deterministic distortion measures. Furthermore we show that the gap between the achievable region and the outer bounds can be bounded by a constant when square error distortion measure is used. The notion of perfectly scalable coding is introduced as both the stages operate on the Wyner-Ziv bound, and necessary and sufficient conditions are given for sources satisfying a mild support condition. Using SI-scalable coding and successive refinement Wyner-Ziv coding as basic building blocks, a complete characterization is provided for the important quadratic Gaussian source with multiple jointly Gaussian side-informations, where the side information quality does not have to be monotonic along the scalable coding order. Partial result is provided for the doubly symmetric binary source with Hamming distortion when the worse side information is a constant, for which one of the outer bound is strictly tighter than the other one.Comment: 35 pages, submitted to IEEE Transaction on Information Theor

    Multiuser Successive Refinement and Multiple Description Coding

    Full text link
    We consider the multiuser successive refinement (MSR) problem, where the users are connected to a central server via links with different noiseless capacities, and each user wishes to reconstruct in a successive-refinement fashion. An achievable region is given for the two-user two-layer case and it provides the complete rate-distortion region for the Gaussian source under the MSE distortion measure. The key observation is that this problem includes the multiple description (MD) problem (with two descriptions) as a subsystem, and the techniques useful in the MD problem can be extended to this case. We show that the coding scheme based on the universality of random binning is sub-optimal, because multiple Gaussian side informations only at the decoders do incur performance loss, in contrast to the case of single side information at the decoder. We further show that unlike the single user case, when there are multiple users, the loss of performance by a multistage coding approach can be unbounded for the Gaussian source. The result suggests that in such a setting, the benefit of using successive refinement is not likely to justify the accompanying performance loss. The MSR problem is also related to the source coding problem where each decoder has its individual side information, while the encoder has the complete set of the side informations. The MSR problem further includes several variations of the MD problem, for which the specialization of the general result is investigated and the implication is discussed.Comment: 10 pages, 5 figures. To appear in IEEE Transaction on Information Theory. References updated and typos correcte

    An Examination of the Sufficiency of Small Qualitative Samples

    Get PDF
    These findings suggest that under some study conditions, rich qualitative findings can be discovered with relatively small sample sizes. Further determining the parameters under which this applies would be helpful to researchers and research participants alike. Most efforts thus far have been done with studies relying on individual interviews, and many are within the medical field. In addition, examinations of minimal required sample sizes that examine available interviews once, in the order they were collected, raise concerns about possible temporal bias. We sought to examine the minimum sample sizes needed to adequately include the themes and codes in areas of inquiry within the field of social work. Considering three distinct qualitative research studies inclusive of both individual interviewing and focus group data collection approaches, we addressed four research questions: (1) What minimum sample size is needed to adequately identify codes (smaller units of meaning) within the data? (2) What minimum sample size is needed to ensure that all larger themes are partially represented by at least one of the codes that comprise that theme? (3) What minimum sample size is needed to fully realize the complete dimensionality of all themes by including all assigned codes? (4) Are minimum sample sizes needed consistent across different substantive areas of exploration and different modes of data collection, specifically individual interviews and focus groups? To address temporal bias, we addressed these questions by examining multiple random draws of various sample sizes within each included qualitative study

    Kolmogorov's Structure Functions and Model Selection

    Full text link
    In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The ``structure function'' of the given data expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the ``true'' model is in the model class considered or not. In this setting, this happens {\em with certainty}, rather than with high probability as is in the classical case. We precisely quantify the goodness-of-fit of an individual model with respect to individual data. We show that--within the obvious constraints--every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the ``algorithmic minimal sufficient statistic.''Comment: 25 pages LaTeX, 5 figures. In part in Proc 47th IEEE FOCS; this final version (more explanations, cosmetic modifications) to appear in IEEE Trans Inform T
    • …
    corecore