3,120 research outputs found
Log-log Convexity of Type-Token Growth in Zipf's Systems
It is traditionally assumed that Zipf's law implies the power-law growth of
the number of different elements with the total number of elements in a system
- the so-called Heaps' law. We show that a careful definition of Zipf's law
leads to the violation of Heaps' law in random systems, and obtain alternative
growth curves. These curves fulfill universal data collapses that only depend
on the value of the Zipf's exponent. We observe that real books behave very
much in the same way as random systems, despite the presence of burstiness in
word occurrence. We advance an explanation for this unexpected correspondence
Languages cool as they expand: Allometric scaling and the decreasing need for new words
We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This āācooling patternāā forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature
Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems
Background: Zipf's law and Heaps' law are observed in disparate complex
systems. Of particular interests, these two laws often appear together. Many
theoretical models and analyses are performed to understand their co-occurrence
in real systems, but it still lacks a clear picture about their relation.
Methodology/Principal Findings: We show that the Heaps' law can be considered
as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we
refine the known approximate solution of the Heaps' exponent provided the
Zipf's exponent. We show that the approximate solution is indeed an asymptotic
solution for infinite systems, while in the finite-size system the Heaps'
exponent is sensitive to the system size. Extensive empirical analysis on tens
of disparate systems demonstrates that our refined results can better capture
the relation between the Zipf's and Heaps' exponents. Conclusions/Significance:
The present analysis provides a clear picture about the relation between the
Zipf's law and Heaps' law without the help of any specific stochastic model,
namely the Heaps' law is indeed a derivative phenomenon from Zipf's law. The
presented numerical method gives considerably better estimation of the Heaps'
exponent given the Zipf's exponent and the system size. Our analysis provides
some insights and implications of real complex systems, for example, one can
naturally obtained a better explanation of the accelerated growth of scale-free
networks.Comment: 15 pages, 6 figures, 1 Tabl
Innovation and Nested Preferential Growth in Chess Playing Behavior
Complexity develops via the incorporation of innovative properties. Chess is
one of the most complex strategy games, where expert contenders exercise
decision making by imitating old games or introducing innovations. In this
work, we study innovation in chess by analyzing how different move sequences
are played at the population level. It is found that the probability of
exploring a new or innovative move decreases as a power law with the frequency
of the preceding move sequence. Chess players also exploit already known move
sequences according to their frequencies, following a preferential growth
mechanism. Furthermore, innovation in chess exhibits Heaps' law suggesting
similarities with the process of vocabulary growth. We propose a robust
generative mechanism based on nested Yule-Simon preferential growth processes
that reproduces the empirical observations. These results, supporting the
self-similar nature of innovations in chess are important in the context of
decision making in a competitive scenario, and extend the scope of relevant
findings recently discovered regarding the emergence of Zipf's law in chess.Comment: 8 pages, 4 figures, accepted for publication in Europhysics Letters
(EPL
COSMICAH 2005: workshop on verification of COncurrent Systems with dynaMIC Allocated Heaps (a Satellite event of ICALP 2005) - Informal Proceedings
Lisboa Portugal, 10 July 200
Do Neural Nets Learn Statistical Laws behind Natural Language?
The performance of deep learning in natural language processing has been
spectacular, but the reasons for this success remain unclear because of the
inherent complexity of deep learning. This paper provides empirical evidence of
its effectiveness and of a limitation of neural networks for language
engineering. Precisely, we demonstrate that a neural language model based on
long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law,
two representative statistical properties underlying natural language. We
discuss the quality of reproducibility and the emergence of Zipf's law and
Heaps' law as training progresses. We also point out that the neural language
model has a limitation in reproducing long-range correlation, another
statistical property of natural language. This understanding could provide a
direction for improving the architectures of neural networks.Comment: 21 pages, 11 figure
Footprints in Local Reasoning
Local reasoning about programs exploits the natural local behaviour common in
programs by focussing on the footprint - that part of the resource accessed by
the program. We address the problem of formally characterising and analysing
the footprint notion for abstract local functions introduced by Calcagno, O
Hearn and Yang. With our definition, we prove that the footprints are the only
essential elements required for a complete specification of a local function.
We formalise the notion of small specifications in local reasoning and show
that for well-founded resource models, a smallest specification always exists
that only includes the footprints, and also present results for the
non-well-founded case. Finally, we use this theory of footprints to investigate
the conditions under which the footprints correspond to the smallest safe
states. We present a new model of RAM in which, unlike the standard model, the
footprints of every program correspond to the smallest safe states, and we also
identify a general condition on the primitive commands of a programming
language which guarantees this property for arbitrary models.Comment: LMCS 2009 (FOSSACS 2008 special issue
- ā¦