474 research outputs found
Optimal coding and the origins of Zipfian laws
The problem of compression in standard information theory consists of
assigning codes as short as possible to numbers. Here we consider the problem
of optimal coding -- under an arbitrary coding scheme -- and show that it
predicts Zipf's law of abbreviation, namely a tendency in natural languages for
more frequent words to be shorter. We apply this result to investigate optimal
coding also under so-called non-singular coding, a scheme where unique
segmentation is not warranted but codes stand for a distinct number. Optimal
non-singular coding predicts that the length of a word should grow
approximately as the logarithm of its frequency rank, which is again consistent
with Zipf's law of abbreviation. Optimal non-singular coding in combination
with the maximum entropy principle also predicts Zipf's rank-frequency
distribution. Furthermore, our findings on optimal non-singular coding
challenge common beliefs about random typing. It turns out that random typing
is in fact an optimal coding process, in stark contrast with the common
assumption that it is detached from cost cutting considerations. Finally, we
discuss the implications of optimal coding for the construction of a compact
theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of
concordant pair corrected, proofs polished, references update
Optimality in Quantum Data Compression using Dynamical Entropy
In this article we study lossless compression of strings of pure quantum
states of indeterminate-length quantum codes which were introduced by
Schumacher and Westmoreland. Past work has assumed that the strings of quantum
data are prepared to be encoded in an independent and identically distributed
way. We introduce the notion of quantum stochastic ensembles, allowing us to
consider strings of quantum states prepared in a more general way. For any
identically distributed quantum stochastic ensemble we define an associated
quantum Markov chain and prove that the optimal average codeword length via
lossless coding is equal to the quantum dynamical entropy of the associated
quantum Markov chain
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series
We show that Kolmogorov complexity and such its estimators as universal codes
(or data compression methods) can be applied for hypotheses testing in a
framework of classical mathematical statistics. The methods for identity
testing and nonparametric testing of serial independence for time series are
suggested.Comment: submitte
Source coding with escort distributions and Renyi entropy bounds
We discuss the interest of escort distributions and R\'enyi entropy in the
context of source coding. We first recall a source coding theorem by Campbell
relating a generalized measure of length to the R\'enyi-Tsallis entropy. We
show that the associated optimal codes can be obtained using considerations on
escort-distributions. We propose a new family of measure of length involving
escort-distributions and we show that these generalized lengths are also
bounded below by the R\'enyi entropy. Furthermore, we obtain that the standard
Shannon codes lengths are optimum for the new generalized lengths measures,
whatever the entropic index. Finally, we show that there exists in this setting
an interplay between standard and escort distributions
Shannon Information and Kolmogorov Complexity
We compare the elementary theories of Shannon information and Kolmogorov
complexity, the extent to which they have a common purpose, and where they are
fundamentally different. We discuss and relate the basic notions of both
theories: Shannon entropy versus Kolmogorov complexity, the relation of both to
universal coding, Shannon mutual information versus Kolmogorov (`algorithmic')
mutual information, probabilistic sufficient statistic versus algorithmic
sufficient statistic (related to lossy compression in the Shannon theory versus
meaningful information in the Kolmogorov theory), and rate distortion theory
versus Kolmogorov's structure function. Part of the material has appeared in
print before, scattered through various publications, but this is the first
comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans
Information Theor
- …