Search CORE

132 research outputs found

Testing the robustness of laws of polysemy and brevity versus frequency

Author: A Corral
A Kilgarriff
B MacWhinney
C Fellbaum
EG Altmann
F Font-Clos
G Fenk-Oczlon
GK Zipf
GK Zipf
GK Zipf
J Baixeries
J Ke
M Razavi
N Ide
R Ferrer-i-Cancho
R Newson
RH Baayen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The pioneering research of G.K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. Here we focus on a couple of them: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. Here we evaluate the robustness of these laws in contexts where they have not been explored yet to our knowledge. The recovery of the laws again in new conditions provides support for the hypothesis that they originate from abstract mechanisms.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

The meaning-frequency law in Zipfian optimization models of communication

Author: Ferrer-i-Cancho Ramon
Publication venue
Publication date: 01/01/2016
Field of study

According to Zipf's meaning-frequency law, words that are more frequent tend to have more meanings. Here it is shown that a linear dependency between the frequency of a form and its number of meanings is found in a family of models of Zipf's law for word frequencies. This is evidence for a weak version of the meaning-frequency law. Interestingly, that weak law (a) is not an inevitable of property of the assumptions of the family and (b) is found at least in the narrow regime where those models exhibit Zipf's law for word frequencies

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Zipf's laws of meaning in Catalan

Author: Baixeries Jaume
Català Neus
Ferrer-Cancho Ramon
Hernández-Fernández Antoni
Padró Lluís
Publication venue
Publication date: 30/06/2021
Field of study

In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.Comment: 21 pages, 11 figure

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

The polysemy of the words that children learn over time

Author: Baixeries i Juvillà Jaume
Casas Fernández Bernardino
Catala Roig Neus
Ferrer Cancho Ramon
Hernández Fernández Antonio
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2018
Field of study

Here we study polysemy as a potential learning bias in vocabulary learning in children. We employ a massive set of transcriptions of conversations between children and adults in English, to analyze the evolution of mean polysemy in the words produced by children whose ages range between 10 and 60 months. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, no dependency with time is found in adults. This may suggest that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Our hypothesis is twofold: (a) polysemy is a standalone bias or (b) polysemy is a side-effect of other biases. Interestingly, the bias for low polysemy above weakens when controlling by syntactic category (noun, verb, adjective or adverb). The pattern of the evolution of polysemy suggests that both hypotheses may apply to some extent, and that (b) would originate from a combination of the well-known preference for nouns and the lower polysemy of nouns with respect to other syntactic categories.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

General patterns and language variation: word frequencies across English, German, and Chinese

Author: Tjuka A.
Publication venue
Publication date: 01/12/2020
Field of study

Cross-linguistic studies of concepts provide valuable insights for the investigation of the mental lexicon. Recent developments of cross-linguistic databases facilitate an exploration of a diverse set of languages on the basis of comparative concepts. These databases make use of a well-established reference catalog, the Concepticon, which is built from concept lists published in linguistics. A recently released feature of the Concepticon includes data on norms, ratings, and relations for words and concepts. The present study used data on word frequencies to test two hypotheses. First, I examined the assumption that related languages (i.e., English and German) share concepts with more similar frequencies than non-related languages (i.e., English and Chinese). Second, the variation of frequencies across both language pairs was explored to answer the question of whether the related languages share fewer concepts with a large difference between the frequency than the non-related languages. The findings indicate that related languages experience less variation in their frequencies. If there is variation, it seems to be due to cultural and structural differences. The implications of this study are far-reaching in that it exemplifies the use of cross-linguistic data for the study of the mental lexicon

MPG.PuRe

Linguistic Laws and Compression in a Comparative Perspective: A Conceptual Review and Phylogenetic Test in Mammals

Author: KANG TARANDEEP,SINGH
Publication venue
Publication date: 01/01/2021
Field of study

Over the last several decades, the application of “Linguistic Laws” - statistical regularities underlying the structure of language- to studying human languages has exploded. These ideas, adopted from Information Theory, and quantitative linguistics, have been useful in helping to understand the evolution of the underlying structures of communicative systems. Moreover, since the publication of a seminal article in 2010, the field has taken a comparative approach to assess the degree of similarities and differences underlying the organisation of communication systems across the natural world. In this thesis, I begin by surveying the state of the field as it pertains to the study of linguistic laws and compression in nonhuman animal communication systems. I subsequently identify a number of theoretical and methodological gaps in the current literature and suggest ways in which these might be rectified to strengthen conclusions in future and enable the pursuit of novel theoretical questions. In the second chapter, I undertake a phylogenetically controlled analysis, which aims to demonstrate the extent of conformity to Zipf’s Law of Abbreviation in mammalian vocal repertoires. I test each individual repertoire, and then examine the entire collection of repertoires together. I find mixed evidence of conformity to the Law of Abbreviation, and conclude with some implications of this work, and future directions in which it might be extended

Durham e-Theses

Words and their secrets

Author: Finatto Maria José Bocorny
Santos Diana
Publication venue
Publication date: 01/01/2010
Field of study

Repositório Comum

Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

Author: Danforth Christopher M.
Dodds Peter Sheridan
Gray Tyler J.
Publication venue: UVM ScholarWorks
Publication date: 08/07/2019
Field of study

Stretched words like \u27heellllp\u27 or \u27heyyyyy\u27 are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of \u27stretchable words\u27 found in roughly 100 billion tweets authored over an 8 year period. We introduce two central parameters, \u27balance\u27 and \u27stretch\u27, that capture their main characteristics, and explore their dynamics by creating visual tools we call \u27balance plots\u27 and \u27spelling trees\u27. We discuss how the tools and methods we develop here could be used to study the statistical patterns of mistypings and misspellings and be used as a basis for other linguistic research involving stretchable words, along with the potential applications in augmenting dictionaries, improving language processing, and in any area where sequence construction matters, such as genetics

arXiv.org e-Print Archive

UVM ScholarWorks

Directory of Open Access Journals

Empirical studies on word representations

Author: Suster Simon
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2016
Field of study

ARTS repository - University of Groningen