291 research outputs found
Emergence of Zipf's Law in the Evolution of Communication
Zipf's law seems to be ubiquitous in human languages and appears to be a
universal property of complex communicating systems. Following the early
proposal made by Zipf concerning the presence of a tension between the efforts
of speaker and hearer in a communication system, we introduce evolution by
means of a variational approach to the problem based on Kullback's Minimum
Discrimination of Information Principle. Therefore, using a formalism fully
embedded in the framework of information theory, we demonstrate that Zipf's law
is the only expected outcome of an evolving, communicative system under a
rigorous definition of the communicative tension described by Zipf.Comment: 7 pages, 2 figure
A commentary on "The now-or-never bottleneck: a fundamental constraint on language", by Christiansen and Chater (2016)
In a recent article, Christiansen and Chater (2016) present a fundamental
constraint on language, i.e. a now-or-never bottleneck that arises from our
fleeting memory, and explore its implications, e.g., chunk-and-pass processing,
outlining a framework that promises to unify different areas of research. Here
we explore additional support for this constraint and suggest further
connections from quantitative linguistics and information theory
Exploring the law of text geographic information
Textual geographic information is indispensable and heavily relied upon in
practical applications. The absence of clear distribution poses challenges in
effectively harnessing geographic information, thereby driving our quest for
exploration. We contend that geographic information is influenced by human
behavior, cognition, expression, and thought processes, and given our intuitive
understanding of natural systems, we hypothesize its conformity to the Gamma
distribution. Through rigorous experiments on a diverse range of 24 datasets
encompassing different languages and types, we have substantiated this
hypothesis, unearthing the underlying regularities governing the dimensions of
quantity, length, and distance in geographic information. Furthermore,
theoretical analyses and comparisons with Gaussian distributions and Zipf's law
have refuted the contingency of these laws. Significantly, we have estimated
the upper bounds of human utilization of geographic information, pointing
towards the existence of uncharted territories. Also, we provide guidance in
geographic information extraction. Hope we peer its true countenance uncovering
the veil of geographic information.Comment: IP
Optimization models of natural communication
A family of information theoretic models of communication was introduced more than a decade ago to explain the origins of Zipf’s law for word frequencies. The family is a based on a combination of two information theoretic principles: maximization of mutual information between forms and meanings and minimization of form entropy. The family also sheds light on the origins of three other patterns: the principle of contrast; a related vocabulary learning bias; and the meaning-frequency law. Here two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics. The minimization of this linear function is linked to the problem of compression of standard information theory and might be tuned by self-organization.Peer ReviewedPostprint (author's final draft
The placement of the head that maximizes predictability. An information theoretic approach
The minimization of the length of syntactic dependencies is a
well-established principle of word order and the basis of a mathematical theory
of word order. Here we complete that theory from the perspective of information
theory, adding a competing word order principle: the maximization of
predictability of a target element. These two principles are in conflict: to
maximize the predictability of the head, the head should appear last, which
maximizes the costs with respect to dependency length minimization. The
implications of such a broad theoretical framework to understand the
optimality, diversity and evolution of the six possible orderings of subject,
object and verb are reviewed.Comment: in press in Glottometric
Exploring the Law of Numbers: Evidence from China's Real Estate
The renowned proverb, Numbers do not lie, underscores the reliability and
insight that lie beneath numbers, a concept of undisputed importance,
especially in economics and finance etc. Despite the prosperity of Benford's
Law in the first digit analysis, its scope fails to remain comprehensiveness
when it comes to deciphering the laws of number. This paper delves into number
laws by taking the financial statements of China real estate as a
representative, quantitatively study not only the first digit, but also depict
the other two dimensions of numbers: frequency and length. The research
outcomes transcend mere reservations about data manipulation and open the door
to discussions surrounding number diversity and the delineation of the usage
insights. This study wields both economic significance and the capacity to
foster a deeper comprehension of numerical phenomena.Comment: DS
The challenges of statistical patterns of language: the case of Menzerath's law in genomes
The importance of statistical patterns of language has been debated over
decades. Although Zipf's law is perhaps the most popular case, recently,
Menzerath's law has begun to be involved. Menzerath's law manifests in
language, music and genomes as a tendency of the mean size of the parts to
decrease as the number of parts increases in many situations. This statistical
regularity emerges also in the context of genomes, for instance, as a tendency
of species with more chromosomes to have a smaller mean chromosome size. It has
been argued that the instantiation of this law in genomes is not indicative of
any parallel between language and genomes because (a) the law is inevitable and
(b) non-coding DNA dominates genomes. Here mathematical, statistical and
conceptual challenges of these criticisms are discussed. Two major conclusions
are drawn: the law is not inevitable and languages also have a correlate of
non-coding DNA. However, the wide range of manifestations of the law in and
outside genomes suggests that the striking similarities between non-coding DNA
and certain linguistics units could be anecdotal for understanding the
recurrence of that statistical law.Comment: Title changed, abstract and introduction improved and little
corrections on the statistical argument
- …