8 research outputs found

    Word forms are structured for efficient use

    Get PDF
    Zipf famously stated that, if natural language lexicons are structured for efficient communication, the words that are used the most frequently should require the least effort. This observation explains the famous finding that the most frequent words in a language tend to be short. A related prediction is that, even within words of the same length, the most frequent word forms should be the ones that are easiest to produce and understand. Using orthographics as a proxy for phonetics, we test this hypothesis using corpora of 96 languages from Wikipedia. We find that, across a variety of languages and language families and controlling for length, the most frequent forms in a language tend to be more orthographically well‐formed and have more orthographic neighbors than less frequent forms. We interpret this result as evidence that lexicons are structured by language usage pressures to facilitate efficient communication. Keywords: Lexicon; Word frequency; Phonology; Communication; EfficiencyNational Science Foundation (Grant ES/N0174041/1

    Disambiguatory Signals are Stronger in Word-initial Positions

    Full text link
    Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e.g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower). This has led to the conjecture -- as in Wedel et al. (2019b), but common elsewhere -- that languages have evolved to provide more information earlier in words than later. Information-theoretic methods to establish such tendencies in lexicons have suffered from several methodological shortcomings that leave open the question of whether this high word-initial informativeness is actually a property of the lexicon or simply an artefact of the incremental nature of recognition. In this paper, we point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word, and present several new measures that avoid these confounds. When controlling for these confounds, we still find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.Comment: Accepted at EACL 2021. Code is available in https://github.com/tpimentelms/frontload-disambiguatio

    The emergence of word-internal repetition through iterated learning:Explaining the mismatch between learning biases and language design

    Get PDF
    The idea that natural language is shaped by biases in learning plays a key role in our understanding of how human language is structured, but its corollary that there should be a correspondence between typological generalisations and ease of acquisition is not always supported. For example, natural languages tend to avoid close repetitions of consonants within a word, but developmental evidence suggests that, if anything, words containing sound repetitions are more, not less, likely to be acquired than those without. In this study, we use word-internal repetition as a test case to provide a cultural evolutionary explanation of when and how learning biases impact on language design. Two artificial language experiments showed that adult speakers possess a bias for both consonant and vowel repetitions when learning novel words, but the effects of this bias were observable in language transmission only when there was a relatively high learning pressure on the lexicon. Based on these results, we argue that whether the design of a language reflects biases in learning depends on the relative strength of pressures from learnability and communication efficiency exerted on the linguistic system during cultural transmission

    Linguistic Laws and Compression in a Comparative Perspective: A Conceptual Review and Phylogenetic Test in Mammals

    Get PDF
    Over the last several decades, the application of “Linguistic Laws” - statistical regularities underlying the structure of language- to studying human languages has exploded. These ideas, adopted from Information Theory, and quantitative linguistics, have been useful in helping to understand the evolution of the underlying structures of communicative systems. Moreover, since the publication of a seminal article in 2010, the field has taken a comparative approach to assess the degree of similarities and differences underlying the organisation of communication systems across the natural world. In this thesis, I begin by surveying the state of the field as it pertains to the study of linguistic laws and compression in nonhuman animal communication systems. I subsequently identify a number of theoretical and methodological gaps in the current literature and suggest ways in which these might be rectified to strengthen conclusions in future and enable the pursuit of novel theoretical questions. In the second chapter, I undertake a phylogenetically controlled analysis, which aims to demonstrate the extent of conformity to Zipf’s Law of Abbreviation in mammalian vocal repertoires. I test each individual repertoire, and then examine the entire collection of repertoires together. I find mixed evidence of conformity to the Law of Abbreviation, and conclude with some implications of this work, and future directions in which it might be extended
    corecore