975 research outputs found
Prefix Codes for Power Laws with Countable Support
In prefix coding over an infinite alphabet, methods that consider specific
distributions generally consider those that decline more quickly than a power
law (e.g., Golomb coding). Particular power-law distributions, however, model
many random variables encountered in practice. For such random variables,
compression performance is judged via estimates of expected bits per input
symbol. This correspondence introduces a family of prefix codes with an eye
towards near-optimal coding of known distributions. Compression performance is
precisely estimated for well-known probability distributions using these codes
and using previously known prefix codes. One application of these near-optimal
codes is an improved representation of rational numbers.Comment: 5 pages, 2 tables, submitted to Transactions on Information Theor
Zipf's law and L. Levin's probability distributions
Zipf's law in its basic incarnation is an empirical probability distribution
governing the frequency of usage of words in a language. As Terence Tao
recently remarked, it still lacks a convincing and satisfactory mathematical
explanation.
In this paper I suggest that at least in certain situations, Zipf's law can
be explained as a special case of the a priori distribution introduced and
studied by L. Levin. The Zipf ranking corresponding to diminishing probability
appears then as the ordering determined by the growing Kolmogorov complexity.
One argument justifying this assertion is the appeal to a recent
interpretation by Yu. Manin and M. Marcolli of asymptotic bounds for
error--correcting codes in terms of phase transition. In the respective
partition function, Kolmogorov complexity of a code plays the role of its
energy.
This version contains minor corrections and additions.Comment: 19 page
Variable-Length Coding of Two-Sided Asymptotically Mean Stationary Measures
We collect several observations that concern variable-length coding of
two-sided infinite sequences in a probabilistic setting. Attention is paid to
images and preimages of asymptotically mean stationary measures defined on
subsets of these sequences. We point out sufficient conditions under which the
variable-length coding and its inverse preserve asymptotic mean stationarity.
Moreover, conditions for preservation of shift-invariant -fields and
the finite-energy property are discussed and the block entropies for stationary
means of coded processes are related in some cases. Subsequently, we apply
certain of these results to construct a stationary nonergodic process with a
desired linguistic interpretation.Comment: 20 pages. A few typos corrected after the journal publicatio
Shannon Information and Kolmogorov Complexity
We compare the elementary theories of Shannon information and Kolmogorov
complexity, the extent to which they have a common purpose, and where they are
fundamentally different. We discuss and relate the basic notions of both
theories: Shannon entropy versus Kolmogorov complexity, the relation of both to
universal coding, Shannon mutual information versus Kolmogorov (`algorithmic')
mutual information, probabilistic sufficient statistic versus algorithmic
sufficient statistic (related to lossy compression in the Shannon theory versus
meaningful information in the Kolmogorov theory), and rate distortion theory
versus Kolmogorov's structure function. Part of the material has appeared in
print before, scattered through various publications, but this is the first
comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans
Information Theor
Large-alphabet sequence modelling - a comparative study
Most raw data is not binary, but over some often large and structured alphabet. Sometimes it is convenient to deal with binarised data sequence, but typically exploiting the original structure of the data significantly improves performance in many practical applications. In this thesis, we study Martin-Lof random sequences that are maximally incompressible and provide a topological view on the size of the set of random sequences. We also investigate the relationship between binary data compression techniques and modelling natural language text with the latter using raw unbinarised data sequence from a large alphabet. We perform an experimental comparative study for them, including an empirical comparison between Kneser-Ney (KN) variants with regular Context Tree Weighting algorithm (CTW) and phase CTW, and with large-alphabet CTW with different estimators. We also apply the idea of Hutter's adaptive sparse Dirichlet-multinomial coding to the KN method and provide a heuristic to make the discounting parameter adaptive. The KN with this adaptive discounting parameter outperforms the traditional KN method on the Large Calgary corpus
On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
The article presents a new interpretation for Zipf-Mandelbrot's law in
natural language which rests on two areas of information theory. Firstly, we
construct a new class of grammar-based codes and, secondly, we investigate
properties of strongly nonergodic stationary processes. The motivation for the
joint discussion is to prove a proposition with a simple informal statement: If
a text of length describes independent facts in a repetitive way
then the text contains at least different words, under
suitable conditions on . In the formal statement, two modeling postulates
are adopted. Firstly, the words are understood as nonterminal symbols of the
shortest grammar-based encoding of the text. Secondly, the text is assumed to
be emitted by a finite-energy strongly nonergodic source whereas the facts are
binary IID variables predictable in a shift-invariant way.Comment: 24 pages, no figure
On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
The article presents a new interpretation for Zipf's law in
natural language which relies on two areas of information
theory. We reformulate the problem of grammar-based compression
and investigate properties of strongly nonergodic stationary
processes. The motivation for the joint discussion is to prove a
proposition with a simple informal statement: If an -letter
long text describes independent facts in a random but
consistent way then the text contains at least
different words.
In the formal statement, two specific postulates are
adopted. Firstly, the words are understood as the nonterminal
symbols of the shortest grammar-based encoding of the
text. Secondly, the texts are assumed to be emitted by a
nonergodic source, with the described facts being binary IID
variables that are asymptotically predictable in a
shift-invariant way.
The proof of the formal proposition applies several new tools.
These are: a construction of universal grammar-based codes for
which the differences of code lengths can be bounded easily,
ergodic decomposition theorems for mutual information between the
past and future of a stationary process, and a lemma that bounds
differences of a sublinear function.
The linguistic relevance of presented modeling assumptions,
theorems, definitions, and examples is discussed in
parallel.While searching for concrete processes to which our
proposition can be applied, we introduce several instances of
strongly nonergodic processes. In particular, we define the
subclass of accessible description processes, which formalizes
the notion of texts that describe facts in a self-contained way
Fair Testing
In this paper we present a solution to the long-standing problem of characterising the coarsest liveness-preserving pre-congruence with respect to a full (TCSP-inspired) process algebra. In fact, we present two distinct characterisations, which give rise to the same relation: an operational one based on a De Nicola-Hennessy-like testing modality which we call should-testing, and a denotational one based on a refined notion of failures. One of the distinguishing characteristics of the should-testing pre-congruence is that it abstracts from divergences in the same way as Milner¿s observation congruence, and as a consequence is strictly coarser than observation congruence. In other words, should-testing has a built-in fairness assumption. This is in itself a property long sought-after; it is in notable contrast to the well-known must-testing of De Nicola and Hennessy (denotationally characterised by a combination of failures and divergences), which treats divergence as catrastrophic and hence is incompatible with observation congruence. Due to these characteristics, should-testing supports modular reasoning and allows to use the proof techniques of observation congruence, but also supports additional laws and techniques. Moreover, we show decidability of should-testing (on the basis of the denotational characterisation). Finally, we demonstrate its advantages by the application to a number of examples, including a scheduling problem, a version of the Alternating Bit-protocol, and fair lossy communication channel
- …