9 research outputs found

    Information content versus word length in random typing

    Get PDF
    Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi et al. In itself, the linear relation does not entail the results of any optimization process

    The placement of the head that maximizes predictability. An information theoretic approach

    Get PDF
    The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed.Comment: in press in Glottometric

    The entropy of words-learnability and expressivity across more than 1000 languages

    Get PDF
    The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with information-theoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without co-textual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.Peer ReviewedPostprint (published version

    The placement of the head that maximizes predictability: An information theoretic approach

    Get PDF
    The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb, are reviewed.Peer ReviewedPostprint (published version

    Predicting head-marking variability in Yucatec Maya relative clause production

    No full text
    Recent proposals hold that the cognitive systems underlying language production exhibit computational properties that facilitate communicative efficiency, i.e., an efficient trade-off between production ease and robust information transmission. We contribute to the cross-linguistic evaluation of the communicative efficiency hypothesis by investigating speakers’ preferences in the production of a typologically rare head-marking alternation that occurs in relative clause constructions in Yucatec Maya. In a sentence recall study, we find that speakers of Yucatec Maya prefer to use reduced forms of relative clause verbs when the relative clause is more contextually expected. This result is consistent with communicative efficiency and thus supports its typological generalizability. We compare two types of cue to the presence of a relative clause, pragmatic cues previously investigated in other languages and a highly predictive morphosyntactic cue specific to Yucatec. We find that Yucatec speakers’ preferences for a reduced verb form are primarily conditioned on the more informative cue. This demonstrates the role of both general principles of language production and their language-specific realizations

    Analysis of Languages with Extreme Values in the Indices of Relativity, Density and Informative Efficiency: The Morphological and Genetic Typology and the Complexity of the Phonetic-Phonological System in the Study of the Number and Length of Words and Phonemes

    Get PDF
    El presente artículo analiza las correlaciones matemáticas entre las lenguas que presentan valores extremos en los llamados ‘índice de relatividad informativa’, ‘índice de densidad informativa’, ‘índice de eficiencia informativa léxica’ e ‘índice de eficiencia informativa fónica’. Dichos índices expresan los coeficientes resultantes de dividir el número de ‘tokens’ y el número de ‘unidades fónicas convencionales de token’, empleados para expresar una misma información. En el presente trabajo nos centramos muy especialmente en aquellas lenguas que muestran valores extremos en dichos indíces y analizamos en qué modo afectan la tipología morfológica o las características fonético-fonológicas de las lenguas a cuestiones como número total de palabras y fonemas, longitud de palabras o economía del lenguaje.This article analyses the mathematical correlations between languages which present extreme values in the so-called ‘index of informative relativity’, ‘index of informative density’, ‘lexical informative efficiency index’, and ‘phonic informative efficiency index’. These indices express the coefficients resulting from dividing the number of ‘tokens’ and number of ‘token conventional phonic units’, used to express the same information. In the present work we focus very especially on those languages that show extreme values in aforesaid indices and we analyze how the morphological typology or the phonetic-phonological characteristics of the languages affect issues such as the total number of words and phonemes, length of words or economy of language

    The cultural evolution of coinage as an informational system

    Get PDF
    The invention of coined money significantly changed economic history, by introducing a convenient and universal medium of exchange, whose value is regulated and guaranteed by a political authority. In order to be used as a means of payment, coins need to be recognized as valid and trustworthy. Combining carefully designed material features with inscriptions and images, they form a system of symbols that store and transmit information, primarily of an economic nature. The aim of this thesis was to investigate how coins encode information, and to understand how historical dynamics and human cognition shaped their evolution as an informational system. These questions were explored over three studies. The first study investigated the influence of changing political and economic circumstances in the ancient Mediterranean (7th - 1st ct. BCE) on the informative role of graphic designs as marks of issuing authority and monetary value. The second study discussed the advantages and challenges of digitization, standardization and quantitative approaches to cultural data, with a focus on coin iconography. The third study examined the representation and perception of monetary value in the properties of contemporary coins. This thesis shows how we can examine the structure and evolution of coins within an interdisciplinary framework, using quantitative methods, combined with insights from evolutionary and cognitive anthropology, and information theory. The increasing availability of expertly curated digital collections opens more possibilities for developing quantitative approaches necessary for proper interpretation of the processes which shaped observed patterns in cultural data. The approach taken in this thesis complements the research in numismatics and economic history on the origins and development of coinage, while also highlighting the possibilities of using historical artefacts to study large-scale patterns in the evolution and transmission of cultural traits

    Information content versus word length in random typing

    No full text
    Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al 2011 Proc. Nat. Acad. Sci. 108 3825). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi and co-workers. In itself, the linear relation does not entail the results of any optimization process.Peer Reviewe
    corecore