11,947 research outputs found
Algorithmic Statistics
While Kolmogorov complexity is the accepted absolute measure of information
content of an individual finite object, a similarly absolute notion is needed
for the relation between an individual data sample and an individual model
summarizing the information in the data, for example, a finite set (or
probability distribution) where the data sample typically came from. The
statistical theory based on such relations between individual objects can be
called algorithmic statistics, in contrast to classical statistical theory that
deals with relations between probabilistic ensembles. We develop the
algorithmic theory of statistic, sufficient statistic, and minimal sufficient
statistic. This theory is based on two-part codes consisting of the code for
the statistic (the model summarizing the regularity, the meaningful
information, in the data) and the model-to-data code. In contrast to the
situation in probabilistic statistical theory, the algorithmic relation of
(minimal) sufficiency is an absolute relation between the individual model and
the individual data sample. We distinguish implicit and explicit descriptions
of the models. We give characterizations of algorithmic (Kolmogorov) minimal
sufficient statistic for all data samples for both description modes--in the
explicit mode under some constraints. We also strengthen and elaborate earlier
results on the ``Kolmogorov structure function'' and ``absolutely
non-stochastic objects''--those rare objects for which the simplest models that
summarize their relevant information (minimal sufficient statistics) are at
least as complex as the objects themselves. We demonstrate a close relation
between the probabilistic notions and the algorithmic ones.Comment: LaTeX, 22 pages, 1 figure, with correction to the published journal
versio
Multiple Description Quantization via Gram-Schmidt Orthogonalization
The multiple description (MD) problem has received considerable attention as
a model of information transmission over unreliable channels. A general
framework for designing efficient multiple description quantization schemes is
proposed in this paper. We provide a systematic treatment of the El Gamal-Cover
(EGC) achievable MD rate-distortion region, and show that any point in the EGC
region can be achieved via a successive quantization scheme along with
quantization splitting. For the quadratic Gaussian case, the proposed scheme
has an intrinsic connection with the Gram-Schmidt orthogonalization, which
implies that the whole Gaussian MD rate-distortion region is achievable with a
sequential dithered lattice-based quantization scheme as the dimension of the
(optimal) lattice quantizers becomes large. Moreover, this scheme is shown to
be universal for all i.i.d. smooth sources with performance no worse than that
for an i.i.d. Gaussian source with the same variance and asymptotically optimal
at high resolution. A class of low-complexity MD scalar quantizers in the
proposed general framework also is constructed and is illustrated
geometrically; the performance is analyzed in the high resolution regime, which
exhibits a noticeable improvement over the existing MD scalar quantization
schemes.Comment: 48 pages; submitted to IEEE Transactions on Information Theor
Side-information Scalable Source Coding
The problem of side-information scalable (SI-scalable) source coding is
considered in this work, where the encoder constructs a progressive
description, such that the receiver with high quality side information will be
able to truncate the bitstream and reconstruct in the rate distortion sense,
while the receiver with low quality side information will have to receive
further data in order to decode. We provide inner and outer bounds for general
discrete memoryless sources. The achievable region is shown to be tight for the
case that either of the decoders requires a lossless reconstruction, as well as
the case with degraded deterministic distortion measures. Furthermore we show
that the gap between the achievable region and the outer bounds can be bounded
by a constant when square error distortion measure is used. The notion of
perfectly scalable coding is introduced as both the stages operate on the
Wyner-Ziv bound, and necessary and sufficient conditions are given for sources
satisfying a mild support condition. Using SI-scalable coding and successive
refinement Wyner-Ziv coding as basic building blocks, a complete
characterization is provided for the important quadratic Gaussian source with
multiple jointly Gaussian side-informations, where the side information quality
does not have to be monotonic along the scalable coding order. Partial result
is provided for the doubly symmetric binary source with Hamming distortion when
the worse side information is a constant, for which one of the outer bound is
strictly tighter than the other one.Comment: 35 pages, submitted to IEEE Transaction on Information Theor
Multiuser Successive Refinement and Multiple Description Coding
We consider the multiuser successive refinement (MSR) problem, where the
users are connected to a central server via links with different noiseless
capacities, and each user wishes to reconstruct in a successive-refinement
fashion. An achievable region is given for the two-user two-layer case and it
provides the complete rate-distortion region for the Gaussian source under the
MSE distortion measure. The key observation is that this problem includes the
multiple description (MD) problem (with two descriptions) as a subsystem, and
the techniques useful in the MD problem can be extended to this case. We show
that the coding scheme based on the universality of random binning is
sub-optimal, because multiple Gaussian side informations only at the decoders
do incur performance loss, in contrast to the case of single side information
at the decoder. We further show that unlike the single user case, when there
are multiple users, the loss of performance by a multistage coding approach can
be unbounded for the Gaussian source. The result suggests that in such a
setting, the benefit of using successive refinement is not likely to justify
the accompanying performance loss. The MSR problem is also related to the
source coding problem where each decoder has its individual side information,
while the encoder has the complete set of the side informations. The MSR
problem further includes several variations of the MD problem, for which the
specialization of the general result is investigated and the implication is
discussed.Comment: 10 pages, 5 figures. To appear in IEEE Transaction on Information
Theory. References updated and typos correcte
An Examination of the Sufficiency of Small Qualitative Samples
These findings suggest that under some study conditions, rich qualitative findings can be discovered with relatively small sample sizes. Further determining the parameters under which this applies would be helpful to researchers and research participants alike. Most efforts thus far have been done with studies relying on individual interviews, and many are within the medical field. In addition, examinations of minimal required sample sizes that examine available interviews once, in the order they were collected, raise concerns about possible temporal bias. We sought to examine the minimum sample sizes needed to adequately include the themes and codes in areas of inquiry within the field of social work. Considering three distinct qualitative research studies inclusive of both individual interviewing and focus group data collection approaches, we addressed four research questions: (1) What minimum sample size is needed to adequately identify codes (smaller units of meaning) within the data? (2) What minimum sample size is needed to ensure that all larger themes are partially represented by at least one of the codes that comprise that theme? (3) What minimum sample size is needed to fully realize the complete dimensionality of all themes by including all assigned codes? (4) Are minimum sample sizes needed consistent across different substantive areas of exploration and different modes of data collection, specifically individual interviews and focus groups? To address temporal bias, we addressed these questions by examining multiple random draws of various sample sizes within each included qualitative study
Kolmogorov's Structure Functions and Model Selection
In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and
model selection. Let data be finite binary strings and models be finite sets of
binary strings. Consider model classes consisting of models of given maximal
(Kolmogorov) complexity. The ``structure function'' of the given data expresses
the relation between the complexity level constraint on a model class and the
least log-cardinality of a model in the class containing the data. We show that
the structure function determines all stochastic properties of the data: for
every constrained model class it determines the individual best-fitting model
in the class irrespective of whether the ``true'' model is in the model class
considered or not. In this setting, this happens {\em with certainty}, rather
than with high probability as is in the classical case. We precisely quantify
the goodness-of-fit of an individual model with respect to individual data. We
show that--within the obvious constraints--every graph is realized by the
structure function of some data. We determine the (un)computability properties
of the various functions contemplated and of the ``algorithmic minimal
sufficient statistic.''Comment: 25 pages LaTeX, 5 figures. In part in Proc 47th IEEE FOCS; this final
version (more explanations, cosmetic modifications) to appear in IEEE Trans
Inform T
- …