1,452,492 research outputs found

    Information content versus word length in random typing

    Get PDF
    Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi et al. In itself, the linear relation does not entail the results of any optimization process

    Information content of colored motifs in complex networks

    Full text link
    We study complex networks in which the nodes of the network are tagged with different colors depending on the functionality of the nodes (colored graphs), using information theory applied to the distribution of motifs in such networks. We find that colored motifs can be viewed as the building blocks of the networks (much more so than the uncolored structural motifs can be) and that the relative frequency with which these motifs appear in the network can be used to define the information content of the network. This information is defined in such a way that a network with random coloration (but keeping the relative number of nodes with different colors the same) has zero color information content. Thus, colored motif information captures the exceptionality of coloring in the motifs that is maintained via selection. We study the motif information content of the C. elegans brain as well as the evolution of colored motif information in networks that reflect the interaction between instructions in genomes of digital life organisms. While we find that colored motif information appears to capture essential functionality in the C. elegans brain (where the color assignment of nodes is straightforward) it is not obvious whether the colored motif information content always increases during evolution, as would be expected from a measure that captures network complexity. For a single choice of color assignment of instructions in the digital life form Avida, we find rather that colored motif information content increases or decreases during evolution, depending on how the genomes are organized, and therefore could be an interesting tool to dissect genomic rearrangements.Comment: 21 pages, 8 figures, to appear in Artificial Lif

    Watermarking security part I: theory

    Get PDF
    This article proposes a theory of watermarking security based on a cryptanalysis point of view. The main idea is that information about the secret key leaks from the observations, for instance watermarked pieces of content, available to the opponent. Tools from information theory (Shannon's mutual information and Fisher's information matrix) can measure this leakage of information. The security level is then defined as the number of observations the attacker needs to successfully estimate the secret key. This theory is applied to common watermarking methods: the substitutive scheme and spread spectrum based techniques. Their security levels are calculated against three kinds of attack

    IDTxl: The Information Dynamics Toolkit xl: a Python package for the efficient analysis of multivariate information dynamics in networks

    Get PDF
    Producción CientíficaWe present IDTxl (the Information Dynamics Toolkit xl), a new open source Python toolbox for effective network inference from multivariate time series using information theory, available from GitHub (https://github.com/pwollstadt/IDTxl). Information theory (Cover & Thomas, 2006; MacKay, 2003; Shannon, 1948) is the math- ematical theory of information and its transmission over communication channels. In- formation theory provides quantitative measures of the information content of a single random variable (entropy) and of the information shared between two variables (mutual information). The defined measures build on probability theory and solely depend on the probability distributions of the variables involved. As a consequence, the dependence between two variables can be quantified as the information shared between them, without the need to explicitly model a specific type of dependence. Hence, mutual information is a model-free measure of dependence, which makes it a popular choice for the analysis of systems other than communication channels

    On the complexity and the information content of cosmic structures

    Full text link
    The emergence of cosmic structure is commonly considered one of the most complex phenomena in Nature. However, this complexity has never been defined nor measured in a quantitative and objective way. In this work we propose a method to measure the information content of cosmic structure and to quantify the complexity that emerges from it, based on Information Theory. The emergence of complex evolutionary patterns is studied with a statistical symbolic analysis of the datastream produced by state-of-the-art cosmological simulations of forming galaxy clusters. This powerful approach allows us to measure how many bits of information are necessary to predict the evolution of energy fields in a statistical way, and it offers a simple way to quantify when, where and how the cosmic gas behaves in complex ways. The most complex behaviors are found in the peripheral regions of galaxy clusters, where supersonic flows drive shocks and large energy fluctuations over a few tens of million years. Describing the evolution of magnetic energy requires at least a twice as large amount of bits than for the other energy fields. When radiative cooling and feedback from galaxy formation are considered, the cosmic gas is overall found to double its degree of complexity. In the future, Cosmic Information Theory can significantly increase our understanding of the emergence of cosmic structure as it represents an innovative framework to design and analyze complex simulations of the Universe in a simple, yet powerful way.Comment: 15 pages, 14 figures. MNRAS accepted, in pres

    Algorithmic Statistics

    Full text link
    While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on two-part codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the model-to-data code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes--in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the ``Kolmogorov structure function'' and ``absolutely non-stochastic objects''--those rare objects for which the simplest models that summarize their relevant information (minimal sufficient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones.Comment: LaTeX, 22 pages, 1 figure, with correction to the published journal versio
    • …
    corecore