Search CORE

996 research outputs found

Complexity-entropy analysis at different levels of organization in written language

Author: Estevez-Moya D.
Estevez-Rams E.
Rodriguez A. Mesa
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Written language is complex. A written text can be considered an attempt to convey a meaningful message which ends up being constrained by language rules, context dependence and highly redundant in its use of resources. Despite all these constraints, unpredictability is an essential element of natural language. Here we present the use of entropic measures to assert the balance between predictability and surprise in written text. In short, it is possible to measure innovation and context preservation in a document. It is shown that this can also be done at the different levels of organization of a text. The type of analysis presented is reasonably general, and can also be used to analyze the same balance in other complex messages such as DNA, where a hierarchy of organizational levels are known to exist

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm

Author: Bansal R. K.
Jain Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/11/2014
Field of study

The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities under two different scenarios, i) \emph{zero entropy} sources that are characterized by strong long-term memory, and ii) the processes with weak memory as described through various mixing conditions. For zero entropy sources, a general statement on match length is obtained. It is used in the proof of almost sure optimality of Fixed Shift Variant of Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example of stationary and ergodic processes generated by an irrational rotation we establish that for a window of size

n_w

, a compression ratio given by

O(\frac{\log n_w}{{n_w}^a})

where

a

depends on

n_w

and approaches 1 as

n_w \rightarrow \infty

, is obtained under the application of FSLZ and SWLZ algorithms. Also, we give a general expression for the compression ratio for a class of stationary and ergodic processes with zero entropy. Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of the \emph{normalized} version of recurrence times and establish the \emph{large deviation property} (LDP) for a class of mixing processes. Also, an estimator of entropy based on recurrence times is proposed for which large deviation principle is proved for sources satisfying similar mixing conditions.Comment: accepted to appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Caltech Authors

Measuring complexity with zippers

Author: Baronchelli Andrea
Caglioti Emanuele
Loreto Vittorio
Publication venue: 'IOP Publishing'
Publication date: 01/01/2005
Field of study

Physics concepts have often been borrowed and independently developed by other fields of science. In this perspective a significant example is that of entropy in Information Theory. The aim of this paper is to provide a short and pedagogical introduction to the use of data compression techniques for the estimate of entropy and other relevant quantities in Information Theory and Algorithmic Information Theory. We consider in particular the LZ77 algorithm as case study and discuss how a zipper can be used for information extraction.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

City Research Online

CERN Document Server

Archivio della ricerca- Università di Roma La Sapienza