996 research outputs found
Complexity-entropy analysis at different levels of organization in written language
Written language is complex. A written text can be considered an attempt to
convey a meaningful message which ends up being constrained by language rules,
context dependence and highly redundant in its use of resources. Despite all
these constraints, unpredictability is an essential element of natural
language. Here we present the use of entropic measures to assert the balance
between predictability and surprise in written text. In short, it is possible
to measure innovation and context preservation in a document. It is shown that
this can also be done at the different levels of organization of a text. The
type of analysis presented is reasonably general, and can also be used to
analyze the same balance in other complex messages such as DNA, where a
hierarchy of organizational levels are known to exist
On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence
times and match lengths has been studied from various perspectives in
information theory literature. In this paper, we undertake a finer study of
these quantities under two different scenarios, i) \emph{zero entropy} sources
that are characterized by strong long-term memory, and ii) the processes with
weak memory as described through various mixing conditions.
For zero entropy sources, a general statement on match length is obtained. It
is used in the proof of almost sure optimality of Fixed Shift Variant of
Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example
of stationary and ergodic processes generated by an irrational rotation we
establish that for a window of size , a compression ratio given by
where depends on and approaches 1 as
, is obtained under the application of FSLZ and SWLZ
algorithms. Also, we give a general expression for the compression ratio for a
class of stationary and ergodic processes with zero entropy.
Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of
the \emph{normalized} version of recurrence times and establish the \emph{large
deviation property} (LDP) for a class of mixing processes. Also, an estimator
of entropy based on recurrence times is proposed for which large deviation
principle is proved for sources satisfying similar mixing conditions.Comment: accepted to appear in IEEE Transactions on Information Theor
Measuring complexity with zippers
Physics concepts have often been borrowed and independently developed by
other fields of science. In this perspective a significant example is that of
entropy in Information Theory. The aim of this paper is to provide a short and
pedagogical introduction to the use of data compression techniques for the
estimate of entropy and other relevant quantities in Information Theory and
Algorithmic Information Theory. We consider in particular the LZ77 algorithm as
case study and discuss how a zipper can be used for information extraction.Comment: 10 pages, 3 figure
- …