Search CORE

111 research outputs found

Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems

Author: Linyuan Lü
Olaf Sporns
Tao Zhou
Zi-Ke Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/05/2010
Field of study

Background: Zipf's law and Heaps' law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation. Methodology/Principal Findings: We show that the Heaps' law can be considered as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we refine the known approximate solution of the Heaps' exponent provided the Zipf's exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps' exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf's and Heaps' exponents. Conclusions/Significance: The present analysis provides a clear picture about the relation between the Zipf's law and Heaps' law without the help of any specific stochastic model, namely the Heaps' law is indeed a derivative phenomenon from Zipf's law. The presented numerical method gives considerably better estimation of the Heaps' exponent given the Zipf's exponent and the system size. Our analysis provides some insights and implications of real complex systems, for example, one can naturally obtained a better explanation of the accelerated growth of scale-free networks.Comment: 15 pages, 6 figures, 1 Tabl

arXiv.org e-Print Archive

Crossref

PubMed Central

Punctuation effects in English and Esperanto texts

Author: Ausloos
Boulton
Carroll
Carroll
Chomsky
Crane
Dalkilic
Dzurjuk
Ebeling
Erlich
Gabaix
Ha
Hatzigeorgiu
Ishida
Kanter
Kassel
Kawamura
Kosmidis
Koutsoudas
Köhler
Lambiotte
M. Ausloos
Mandelbrot
Mandelbrot
Mandelbrot
Meadow
Meyer
Mikros
Montemurro
Rottmann
Rousseau
Vilenski
Wang
Weisle
West
Wilson
Yule
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\it Alice in wonderland} and {\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity (

ca.

0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.Comment: 13 pages, 7 figures (3x2+1), 60 reference

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

Mathematical Modelling of Demographic and Migratory Dynamics

Author: James Charlotte R
Publication venue
Publication date: 28/11/2019
Field of study

Explore Bristol Research

Heaps' Law and Heaps functions in tagged texts: Evidences of their linguistic relevance

Author: Chacoma Andrés Alberto
Zanette Damian Horacio
Publication venue: Royal Society
Publication date: 07/01/2020
Field of study

We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or 'tags,' namely, nouns, verbs and others), and analyse the progressive appearance of new words of each tag along each individual text. We find that, as prescribed by Heaps' Law, vocabulary sizes and text lengths follow a well-defined power-law relation. Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text. Deviations from this average, however, are statistically significant and show systematic trends across the corpus. Specifically, we find that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings. Moreover, different tags add systematically distinct contributions to this tendency, with verbs and others being respectively more and less retarded than the mean trend, and nouns following instead the overall mean. These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps' Law, a feature that is still in need of extensive assessment.Fil: Chacoma, Andrés Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Física Enrique Gaviola. Universidad Nacional de Córdoba. Instituto de Física Enrique Gaviola; ArgentinaFil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área Investigaciones y Aplicaciones no Nucleares; Argentin

arXiv.org e-Print Archive

CONICET Digital

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Corrections of Zipf's and Heaps' Laws Derived from Hapax Rate Models

Author: Dębowski Łukasz
Publication venue
Publication date: 25/07/2023
Field of study

The article introduces corrections to Zipf's and Heaps' laws based on systematic models of the hapax rate. The derivation rests on two assumptions: The first one is the standard urn model which predicts that marginal frequency distributions for shorter texts look as if word tokens were sampled blindly from a given longer text. The second assumption posits that the rate of hapaxes is a simple function of the text size. Four such functions are discussed: the constant model, the Davis model, the linear model, and the logistic model. It is shown that the logistic model yields the best fit.Comment: 42 pages, 7 figures, 3 table

arXiv.org e-Print Archive

Measuring and modelling Internet diffusion using second level domains: the case of Italy

Author: Andrea Bonaccorsi
Cristina Rossi
Irma Serrecchia
Maurizio Martinelli
Publication venue
Publication date
Field of study

The last 10 years witnessed an exponential growth of the Internet. According to Hobbes' Internet Timeline, the Internet hosts are about 93 million, while in 1989 they were 100,000. The same happens for second level domain names. In July 1989 the registered domains were about 3,900 while they were over 2 million in July 2000. This paper reports about the construction of a database containing daily observations on registrations of second level domain names underneath the it ccTLD in order to analyse the diffusion of Internet among families and businesses. The section of the database referring to domains registered by individuals is analysed. The penetration rate over the relevant population of potential adopters is computed at highly disaggregated geographical level (province). A concentration analysis is carried out to investigate whether the geographical distribution of Internet is less concentrated than population and income suggesting a diffusive effect. Regression analysis is carried out using demographic, social, economic and infrastructure indicators. Finally we briefly describe the further developments of our research. At the present we are constructing a database containing domains registered by firms together with data about the registrants; the idea is to use this new database and the previous one in order to check for the existence of power laws both in the number of domains registered in each province and in the number of domains registered by each firm.Domain names, Internet metrics, Diffusion, Power laws, Zipf s law

Research Papers in Economics