33,904 research outputs found
Optimal coding and the origins of Zipfian laws
The problem of compression in standard information theory consists of
assigning codes as short as possible to numbers. Here we consider the problem
of optimal coding -- under an arbitrary coding scheme -- and show that it
predicts Zipf's law of abbreviation, namely a tendency in natural languages for
more frequent words to be shorter. We apply this result to investigate optimal
coding also under so-called non-singular coding, a scheme where unique
segmentation is not warranted but codes stand for a distinct number. Optimal
non-singular coding predicts that the length of a word should grow
approximately as the logarithm of its frequency rank, which is again consistent
with Zipf's law of abbreviation. Optimal non-singular coding in combination
with the maximum entropy principle also predicts Zipf's rank-frequency
distribution. Furthermore, our findings on optimal non-singular coding
challenge common beliefs about random typing. It turns out that random typing
is in fact an optimal coding process, in stark contrast with the common
assumption that it is detached from cost cutting considerations. Finally, we
discuss the implications of optimal coding for the construction of a compact
theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of
concordant pair corrected, proofs polished, references update
The placement of the head that maximizes predictability. An information theoretic approach
The minimization of the length of syntactic dependencies is a
well-established principle of word order and the basis of a mathematical theory
of word order. Here we complete that theory from the perspective of information
theory, adding a competing word order principle: the maximization of
predictability of a target element. These two principles are in conflict: to
maximize the predictability of the head, the head should appear last, which
maximizes the costs with respect to dependency length minimization. The
implications of such a broad theoretical framework to understand the
optimality, diversity and evolution of the six possible orderings of subject,
object and verb are reviewed.Comment: in press in Glottometric
Estimating the Algorithmic Complexity of Stock Markets
Randomness and regularities in Finance are usually treated in probabilistic
terms. In this paper, we develop a completely different approach in using a
non-probabilistic framework based on the algorithmic information theory
initially developed by Kolmogorov (1965). We present some elements of this
theory and show why it is particularly relevant to Finance, and potentially to
other sub-fields of Economics as well. We develop a generic method to estimate
the Kolmogorov complexity of numeric series. This approach is based on an
iterative "regularity erasing procedure" implemented to use lossless
compression algorithms on financial data. Examples are provided with both
simulated and real-world financial time series. The contributions of this
article are twofold. The first one is methodological : we show that some
structural regularities, invisible with classical statistical tests, can be
detected by this algorithmic method. The second one consists in illustrations
on the daily Dow-Jones Index suggesting that beyond several well-known
regularities, hidden structure may in this index remain to be identified
- …