Search CORE

474 research outputs found

Optimal coding and the origins of Zipfian laws

Author: Bentz Christian
Ferrer-i-Cancho Ramon
Seguin Caio
Publication venue: 'Informa UK Limited'
Publication date: 29/05/2020
Field of study

The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding -- under an arbitrary coding scheme -- and show that it predicts Zipf's law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf's law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf's rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of concordant pair corrected, proofs polished, references update

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Optimality in Quantum Data Compression using Dynamical Entropy

Author: Androulakis George
Wright Duncan
Publication venue: 'American Physical Society (APS)'
Publication date: 03/08/2019
Field of study

In this article we study lossless compression of strings of pure quantum states of indeterminate-length quantum codes which were introduced by Schumacher and Westmoreland. Past work has assumed that the strings of quantum data are prepared to be encoded in an independent and identically distributed way. We introduce the notion of quantum stochastic ensembles, allowing us to consider strings of quantum states prepared in a more general way. For any identically distributed quantum stochastic ensemble we define an associated quantum Markov chain and prove that the optimal average codeword length via lossless coding is equal to the quantum dynamical entropy of the associated quantum Markov chain

arXiv.org e-Print Archive

Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series

Author: Astola Jaakko
Gammerman Alex
Ryabko Boris
Publication venue
Publication date: 01/01/2005
Field of study

We show that Kolmogorov complexity and such its estimators as universal codes (or data compression methods) can be applied for hypotheses testing in a framework of classical mathematical statistics. The methods for identity testing and nonparametric testing of serial independence for time series are suggested.Comment: submitte

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Dagstuhl Research Online Publication Server

Source coding with escort distributions and Renyi entropy bounds

Author: Bercher J. -F.
Publication venue: 'Elsevier BV'
Publication date: 01/08/2009
Field of study

We discuss the interest of escort distributions and R\'enyi entropy in the context of source coding. We first recall a source coding theorem by Campbell relating a generalized measure of length to the R\'enyi-Tsallis entropy. We show that the associated optimal codes can be obtained using considerations on escort-distributions. We propose a new family of measure of length involving escort-distributions and we show that these generalized lengths are also bounded below by the R\'enyi entropy. Furthermore, we obtain that the standard Shannon codes lengths are optimum for the new generalized lengths measures, whatever the entropic index. Finally, we show that there exists in this setting an interplay between standard and escort distributions

arXiv.org e-Print Archive

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Shannon Information and Kolmogorov Complexity

Author: Grunwald Peter
Vitanyi Paul
Publication venue
Publication date: 01/01/2004
Field of study

We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy versus Kolmogorov complexity, the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual information, probabilistic sufficient statistic versus algorithmic sufficient statistic (related to lossy compression in the Shannon theory versus meaningful information in the Kolmogorov theory), and rate distortion theory versus Kolmogorov's structure function. Part of the material has appeared in print before, scattered through various publications, but this is the first comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans Information Theor

arXiv.org e-Print Archive

CiteSeerX