Search CORE

291 research outputs found

Emergence of Zipf's Law in the Evolution of Communication

Author: Bernat Corominas-Murtra
F. Auerbach
G. K. Zipf
G. K. Zipf
Jordi Fortuny
M. Abramowitz
P. Vogt
Ricard V. Solé
S. Kullback
S. N. Dorogovtsev
T. M. Cover
V. Pareto
Publication venue: 'American Physical Society (APS)'
Publication date: 13/01/2011
Field of study

Zipf's law seems to be ubiquitous in human languages and appears to be a universal property of complex communicating systems. Following the early proposal made by Zipf concerning the presence of a tension between the efforts of speaker and hearer in a communication system, we introduce evolution by means of a variational approach to the problem based on Kullback's Minimum Discrimination of Information Principle. Therefore, using a formalism fully embedded in the framework of information theory, we demonstrate that Zipf's law is the only expected outcome of an evolving, communicative system under a rigorous definition of the communicative tension described by Zipf.Comment: 7 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Digital.CSIC

Diposit Digital de la Universitat de Barcelona

A commentary on "The now-or-never bottleneck: a fundamental constraint on language", by Christiansen and Chater (2016)

Author: Ferrer-i-Cancho Ramon
Publication venue
Publication date: 01/01/2017
Field of study

In a recent article, Christiansen and Chater (2016) present a fundamental constraint on language, i.e. a now-or-never bottleneck that arises from our fleeting memory, and explore its implications, e.g., chunk-and-pass processing, outlining a framework that promises to unify different areas of research. Here we explore additional support for this constraint and suggest further connections from quantitative linguistics and information theory

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Exploring the law of text geographic information

Author: Ren Ming
Wang Zhenhua
Xu Guang
Zhang Daiyu
Publication venue
Publication date: 31/08/2023
Field of study

Textual geographic information is indispensable and heavily relied upon in practical applications. The absence of clear distribution poses challenges in effectively harnessing geographic information, thereby driving our quest for exploration. We contend that geographic information is influenced by human behavior, cognition, expression, and thought processes, and given our intuitive understanding of natural systems, we hypothesize its conformity to the Gamma distribution. Through rigorous experiments on a diverse range of 24 datasets encompassing different languages and types, we have substantiated this hypothesis, unearthing the underlying regularities governing the dimensions of quantity, length, and distance in geographic information. Furthermore, theoretical analyses and comparisons with Gaussian distributions and Zipf's law have refuted the contingency of these laws. Significantly, we have estimated the upper bounds of human utilization of geographic information, pointing towards the existence of uncharted territories. Also, we provide guidance in geographic information extraction. Hope we peer its true countenance uncovering the veil of geographic information.Comment: IP

arXiv.org e-Print Archive

Optimization models of natural communication

Author: Ferrer Cancho Ramon
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

A family of information theoretic models of communication was introduced more than a decade ago to explain the origins of Zipf’s law for word frequencies. The family is a based on a combination of two information theoretic principles: maximization of mutual information between forms and meanings and minimization of form entropy. The family also sheds light on the origins of three other patterns: the principle of contrast; a related vocabulary learning bias; and the meaning-frequency law. Here two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics. The minimization of this linear function is linked to the problem of compression of standard information theory and might be tuned by self-organization.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The placement of the head that maximizes predictability. An information theoretic approach

Author: Ferrer-i-Cancho Ramon
Publication venue
Publication date: 01/01/2017
Field of study

The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed.Comment: in press in Glottometric

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Exploring the Law of Numbers: Evidence from China's Real Estate

Author: Wang Zhenhua
Zhang Fuqian
Publication venue
Publication date: 10/09/2023
Field of study

The renowned proverb, Numbers do not lie, underscores the reliability and insight that lie beneath numbers, a concept of undisputed importance, especially in economics and finance etc. Despite the prosperity of Benford's Law in the first digit analysis, its scope fails to remain comprehensiveness when it comes to deciphering the laws of number. This paper delves into number laws by taking the financial statements of China real estate as a representative, quantitatively study not only the first digit, but also depict the other two dimensions of numbers: frequency and length. The research outcomes transcend mere reservations about data manipulation and open the door to discussions surrounding number diversity and the delineation of the usage insights. This study wields both economic significance and the capacity to foster a deeper comprehension of numerical phenomena.Comment: DS

arXiv.org e-Print Archive

The challenges of statistical patterns of language: the case of Menzerath's law in genomes

Author: Altmann
Baixeries
Baixeries
Bel-Enguix
Biber
Bloom
Boroda
Carninci
Chung
Cramer
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Hernández-Fernández
Häsler
Ke
Li
Li
Li
Lyons
Maclay
Makalowski
Menzerath
Miller
Miller
Miller
Pennisi
Searls
Siegel
Solé
Suzuki
Taft
Teupenhayn
Wilde
Yazgan
Ye
Zipf
Publication venue: 'Wiley'
Publication date: 29/09/2012
Field of study

The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) non-coding DNA dominates genomes. Here mathematical, statistical and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of non-coding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between non-coding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law.Comment: Title changed, abstract and introduction improved and little corrections on the statistical argument

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Diposit Digital de la Universitat de Barcelona