Search CORE

18 research outputs found

Optimization models of natural communication

Author: Ferrer Cancho Ramon
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

A family of information theoretic models of communication was introduced more than a decade ago to explain the origins of Zipf’s law for word frequencies. The family is a based on a combination of two information theoretic principles: maximization of mutual information between forms and meanings and minimization of form entropy. The family also sheds light on the origins of three other patterns: the principle of contrast; a related vocabulary learning bias; and the meaning-frequency law. Here two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics. The minimization of this linear function is linked to the problem of compression of standard information theory and might be tuned by self-organization.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Optimal coding and the origins of Zipfian laws

Author: Bentz Christian
Ferrer-i-Cancho Ramon
Seguin Caio
Publication venue: 'Informa UK Limited'
Publication date: 29/05/2020
Field of study

The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding -- under an arbitrary coding scheme -- and show that it predicts Zipf's law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf's law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf's rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of concordant pair corrected, proofs polished, references update

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

The placement of the head that maximizes predictability. An information theoretic approach

Author: Ferrer-i-Cancho Ramon
Publication venue
Publication date: 01/01/2017
Field of study

The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed.Comment: in press in Glottometric

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Polysemy and brevity versus frequency in language

Author: Baixeries Jaume
Casas Bernardino
Català Neus
Ferrer-i-Cancho Ramon
Hernández-Fernández Antoni
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Parallels of human language in the behavior of bottlenose dolphins

Author: Ferrer-i-Cancho R.
Lusseau D.
McCowan B.
Publication venue
Publication date: 05/05/2016
Field of study

A short review of similarities between dolphins and humans with the help of quantitative linguistics and information theory

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Online Research Database In Technology

Zipf's Law : Balancing signal usage cost and communication efficiency

Author: A Clauset
B Mandelbrot
CE Shannon
Christoph Salge
Daniel Polani
EV Clark
GA Miller
J Baixeries
JP Crutchfield
M Prokopenko
M Prokopenko
M Prokopenko
M Visser
MA Nowak
Mikhail Prokopenko
Neil R. Smalheiser
Nihat Ay
R Dickman
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Suzuki
RK Niven
RK Niven
SK Baek
V Balasubrahmanyan
VA Rokhlin
W Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Copyright: © 2015 Salge et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedWe propose a model that explains the reliable emergence of power laws (e.g., Zipf's law) during the development of different human languages. The model incorporates the principle of least effort in communications, minimizing a combination of the information-Theoretic communication inefficiency and direct signal cost. We prove a general relationship, for all optimal languages, between the signal cost distribution and the resulting distribution of signals. Zipf's law then emerges for logarithmic signal cost distributions, which is the cost distribution expected for words constructed from letters or phonemes. Copyright:Peer reviewedFinal Published versio

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

University of Hertfordshire Research Archive

The placement of the head that maximizes predictability: An information theoretic approach

Author: Ferrer Cancho Ramon
Publication venue: RAM-Verlag
Publication date: 01/01/2017
Field of study

UPCommons. Portal del coneixement obert de la UPC

The polysemy of the words that children learn over time

Author: Baixeries i Juvillà Jaume
Casas Fernández Bernardino
Catala Roig Neus
Ferrer Cancho Ramon
Hernández Fernández Antonio
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2018
Field of study

Here we study polysemy as a potential learning bias in vocabulary learning in children. We employ a massive set of transcriptions of conversations between children and adults in English, to analyze the evolution of mean polysemy in the words produced by children whose ages range between 10 and 60 months. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, no dependency with time is found in adults. This may suggest that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Our hypothesis is twofold: (a) polysemy is a standalone bias or (b) polysemy is a side-effect of other biases. Interestingly, the bias for low polysemy above weakens when controlling by syntactic category (noun, verb, adjective or adverb). The pattern of the evolution of polysemy suggests that both hypotheses may apply to some extent, and that (b) would originate from a combination of the well-known preference for nouns and the lower polysemy of nouns with respect to other syntactic categories.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC