1,452,492 research outputs found
Information content versus word length in random typing
Recently, it has been claimed that a linear relationship between a measure of
information content and word length is expected from word length optimization
and it has been shown that this linearity is supported by a strong correlation
between information content and word length in many languages (Piantadosi et
al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections
between this measure and standard information theory. The relationship between
the measure and word length is studied for the popular random typing process
where a text is constructed by pressing keys at random from a keyboard
containing letters and a space behaving as a word delimiter. Although this
random process does not optimize word lengths according to information content,
it exhibits a linear relationship between information content and word length.
The exact slope and intercept are presented for three major variants of the
random typing process. A strong correlation between information content and
word length can simply arise from the units making a word (e.g., letters) and
not necessarily from the interplay between a word and its context as proposed
by Piantadosi et al. In itself, the linear relation does not entail the results
of any optimization process
Information content of colored motifs in complex networks
We study complex networks in which the nodes of the network are tagged with
different colors depending on the functionality of the nodes (colored graphs),
using information theory applied to the distribution of motifs in such
networks. We find that colored motifs can be viewed as the building blocks of
the networks (much more so than the uncolored structural motifs can be) and
that the relative frequency with which these motifs appear in the network can
be used to define the information content of the network. This information is
defined in such a way that a network with random coloration (but keeping the
relative number of nodes with different colors the same) has zero color
information content. Thus, colored motif information captures the
exceptionality of coloring in the motifs that is maintained via selection. We
study the motif information content of the C. elegans brain as well as the
evolution of colored motif information in networks that reflect the interaction
between instructions in genomes of digital life organisms. While we find that
colored motif information appears to capture essential functionality in the C.
elegans brain (where the color assignment of nodes is straightforward) it is
not obvious whether the colored motif information content always increases
during evolution, as would be expected from a measure that captures network
complexity. For a single choice of color assignment of instructions in the
digital life form Avida, we find rather that colored motif information content
increases or decreases during evolution, depending on how the genomes are
organized, and therefore could be an interesting tool to dissect genomic
rearrangements.Comment: 21 pages, 8 figures, to appear in Artificial Lif
Watermarking security part I: theory
This article proposes a theory of watermarking security based on a cryptanalysis point of view. The main idea is that information about the secret key leaks from the observations, for instance watermarked pieces of content, available to the opponent. Tools from information theory (Shannon's mutual information and Fisher's information matrix) can measure this leakage of information. The security level is then defined as the number of observations the attacker needs to successfully estimate the secret key. This theory is applied to common watermarking methods: the substitutive scheme and spread spectrum based techniques. Their security levels are calculated against three kinds of attack
IDTxl: The Information Dynamics Toolkit xl: a Python package for the efficient analysis of multivariate information dynamics in networks
Producción CientÃficaWe present IDTxl (the Information Dynamics Toolkit xl), a new open source Python toolbox for effective network inference from multivariate time series using information theory, available from GitHub (https://github.com/pwollstadt/IDTxl).
Information theory (Cover & Thomas, 2006; MacKay, 2003; Shannon, 1948) is the math- ematical theory of information and its transmission over communication channels. In- formation theory provides quantitative measures of the information content of a single random variable (entropy) and of the information shared between two variables (mutual information). The defined measures build on probability theory and solely depend on the probability distributions of the variables involved. As a consequence, the dependence between two variables can be quantified as the information shared between them, without the need to explicitly model a specific type of dependence. Hence, mutual information is a model-free measure of dependence, which makes it a popular choice for the analysis of systems other than communication channels
On the complexity and the information content of cosmic structures
The emergence of cosmic structure is commonly considered one of the most
complex phenomena in Nature. However, this complexity has never been defined
nor measured in a quantitative and objective way. In this work we propose a
method to measure the information content of cosmic structure and to quantify
the complexity that emerges from it, based on Information Theory. The emergence
of complex evolutionary patterns is studied with a statistical symbolic
analysis of the datastream produced by state-of-the-art cosmological
simulations of forming galaxy clusters. This powerful approach allows us to
measure how many bits of information are necessary to predict the evolution of
energy fields in a statistical way, and it offers a simple way to quantify
when, where and how the cosmic gas behaves in complex ways. The most complex
behaviors are found in the peripheral regions of galaxy clusters, where
supersonic flows drive shocks and large energy fluctuations over a few tens of
million years. Describing the evolution of magnetic energy requires at least a
twice as large amount of bits than for the other energy fields. When radiative
cooling and feedback from galaxy formation are considered, the cosmic gas is
overall found to double its degree of complexity. In the future, Cosmic
Information Theory can significantly increase our understanding of the
emergence of cosmic structure as it represents an innovative framework to
design and analyze complex simulations of the Universe in a simple, yet
powerful way.Comment: 15 pages, 14 figures. MNRAS accepted, in pres
Algorithmic Statistics
While Kolmogorov complexity is the accepted absolute measure of information
content of an individual finite object, a similarly absolute notion is needed
for the relation between an individual data sample and an individual model
summarizing the information in the data, for example, a finite set (or
probability distribution) where the data sample typically came from. The
statistical theory based on such relations between individual objects can be
called algorithmic statistics, in contrast to classical statistical theory that
deals with relations between probabilistic ensembles. We develop the
algorithmic theory of statistic, sufficient statistic, and minimal sufficient
statistic. This theory is based on two-part codes consisting of the code for
the statistic (the model summarizing the regularity, the meaningful
information, in the data) and the model-to-data code. In contrast to the
situation in probabilistic statistical theory, the algorithmic relation of
(minimal) sufficiency is an absolute relation between the individual model and
the individual data sample. We distinguish implicit and explicit descriptions
of the models. We give characterizations of algorithmic (Kolmogorov) minimal
sufficient statistic for all data samples for both description modes--in the
explicit mode under some constraints. We also strengthen and elaborate earlier
results on the ``Kolmogorov structure function'' and ``absolutely
non-stochastic objects''--those rare objects for which the simplest models that
summarize their relevant information (minimal sufficient statistics) are at
least as complex as the objects themselves. We demonstrate a close relation
between the probabilistic notions and the algorithmic ones.Comment: LaTeX, 22 pages, 1 figure, with correction to the published journal
versio
- …