Search CORE

8 research outputs found

Word forms are structured for efficient use

Author: Atkinson
Baayen
Beddor
Bergem
Brown
Brysbaert
Chen
Coady
Dautriche
Dautriche
Edward Gibson
Fedzechkina
Ferreira
Ferrer-i-Cancho
Frank
Frauenfelder
Gahl
Hills
Hume
Isabelle Dautriche
Jaeger
Jaeger
Jusczyk
Kanwal
Kawasaki
Kyle Mahowald
Landauer
Levy
Lindblom
Lindblom
Lindblom
Luce
Magnuson
Mahowald
Manin
New
Ngon
Ohala
Pate
Piantadosi
Piantadosi
Piantadosi
Sadat
Shannon
Smith
Stemberger
Steven T. Piantadosi
Storkel
Storkel
Storkel
Storkel
Storkel
Swingley
Vitevitch
Vitevitch
Vitevitch
Vitevitch
Zipf
Zipf
Publication venue: 'Wiley'
Publication date: 01/08/2018
Field of study

Zipf famously stated that, if natural language lexicons are structured for efficient communication, the words that are used the most frequently should require the least effort. This observation explains the famous finding that the most frequent words in a language tend to be short. A related prediction is that, even within words of the same length, the most frequent word forms should be the ones that are easiest to produce and understand. Using orthographics as a proxy for phonetics, we test this hypothesis using corpora of 96 languages from Wikipedia. We find that, across a variety of languages and language families and controlling for length, the most frequent forms in a language tend to be more orthographically well‐formed and have more orthographic neighbors than less frequent forms. We interpret this result as evidence that lexicons are structured by language usage pressures to facilitate efficient communication. Keywords: Lexicon; Word frequency; Phonology; Communication; EfficiencyNational Science Foundation (Grant ES/N0174041/1

DSpace@MIT

Crossref

Edinburgh Research Explorer

Disambiguatory Signals are Stronger in Word-initial Positions

Author: Cotterell Ryan
Pimentel Tiago
Roark Brian
Publication venue
Publication date: 03/02/2021
Field of study

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e.g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower). This has led to the conjecture -- as in Wedel et al. (2019b), but common elsewhere -- that languages have evolved to provide more information earlier in words than later. Information-theoretic methods to establish such tendencies in lexicons have suffered from several methodological shortcomings that leave open the question of whether this high word-initial informativeness is actually a property of the lexicon or simply an artefact of the incremental nature of recognition. In this paper, we point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word, and present several new measures that avoid these confounds. When controlling for these confounds, we still find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.Comment: Accepted at EACL 2021. Code is available in https://github.com/tpimentelms/frontload-disambiguatio

arXiv.org e-Print Archive

Repository for Publications and Research Data

The emergence of word-internal repetition through iterated learning:Explaining the mismatch between learning biases and language design

Author: Aikhenvald
Baddeley
Baese-Berk
Berent
Berg
Berkley
Blevins
Boersma
Boll-Avetisyan
Buffat
Bybee
Bybee
Carr
Chomsky
Clements
Cohen Priva
Cohen Priva
Colombo
Colombo
Coltheart
Comrie
Conrad
Corbett
Core Team
Cornish
Creel
Cristia
Croft
Culbertson
Culbertson
Culbertson
Culbertson
Culbertson
Culbertson
de Diego-Balaguer
Endress
Fee
Feldman
Ferguson
Ferguson
Fikkert
Frisch
Frost
Gagliardi
Gagliardi
Gerken
Gervain
Gervain
Gervain
Glewwe
Goldsmith
Goodsitt
Gordon
Greenberg
Greenberg
Hall
Hansson
Hayes
Henson
Horst
Inkelas
Jahnke
Jakobson
Kanwisher
Kanwisher
Karmiloff-Smith
Kemp
Kemp
Kirby
Kirby
Kuo
Leben
Leivada
Levelt
Linzen
MacNeilage
Maddieson
Mahowald
Martin
Martinez-Alvarez
McCarthy
Menn
Miller
Miller
Mills
Mintz
Mirman
Monaghan
Moreton
Moreton
Moreton
Moreton
Müller
Nelson
Nespor
Ohala
Ohala
Onishi
Ota
Ota
Ota
Pacton
Pater
Peirce
Piantadosi
Pozdniakov
Pycha
Pérez-Pereira
Regier
Richards
Rosch
Rose
Saffran
Schwartz
Seyfarth
Skoruppa
Slobin
Smith
Smith
Soto-Faraco
Suzuki
Sóskuthy
Tamariz
Tamariz
Van Orden
Vihman
Vihman
Walter
Wedel
Wedel
Wedel
Wedel
Wilson
Winter
Yip
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

The idea that natural language is shaped by biases in learning plays a key role in our understanding of how human language is structured, but its corollary that there should be a correspondence between typological generalisations and ease of acquisition is not always supported. For example, natural languages tend to avoid close repetitions of consonants within a word, but developmental evidence suggests that, if anything, words containing sound repetitions are more, not less, likely to be acquired than those without. In this study, we use word-internal repetition as a test case to provide a cultural evolutionary explanation of when and how learning biases impact on language design. Two artificial language experiments showed that adult speakers possess a bias for both consonant and vowel repetitions when learning novel words, but the effects of this bias were observable in language transmission only when there was a relatively high learning pressure on the lexicon. Based on these results, we argue that whether the design of a language reflects biases in learning depends on the relative strength of pressures from learnability and communication efficiency exerted on the linguistic system during cultural transmission

Crossref

Edinburgh Research Explorer

MPG.PuRe

Linguistic Laws and Compression in a Comparative Perspective: A Conceptual Review and Phylogenetic Test in Mammals

Author: KANG TARANDEEP,SINGH
Publication venue
Publication date: 01/01/2021
Field of study

Over the last several decades, the application of “Linguistic Laws” - statistical regularities underlying the structure of language- to studying human languages has exploded. These ideas, adopted from Information Theory, and quantitative linguistics, have been useful in helping to understand the evolution of the underlying structures of communicative systems. Moreover, since the publication of a seminal article in 2010, the field has taken a comparative approach to assess the degree of similarities and differences underlying the organisation of communication systems across the natural world. In this thesis, I begin by surveying the state of the field as it pertains to the study of linguistic laws and compression in nonhuman animal communication systems. I subsequently identify a number of theoretical and methodological gaps in the current literature and suggest ways in which these might be rectified to strengthen conclusions in future and enable the pursuit of novel theoretical questions. In the second chapter, I undertake a phylogenetically controlled analysis, which aims to demonstrate the extent of conformity to Zipf’s Law of Abbreviation in mammalian vocal repertoires. I test each individual repertoire, and then examine the entire collection of repertoires together. I find mixed evidence of conformity to the Law of Abbreviation, and conclude with some implications of this work, and future directions in which it might be extended

Durham e-Theses