3,120 research outputs found

    Log-log Convexity of Type-Token Growth in Zipf's Systems

    Full text link
    It is traditionally assumed that Zipf's law implies the power-law growth of the number of different elements with the total number of elements in a system - the so-called Heaps' law. We show that a careful definition of Zipf's law leads to the violation of Heaps' law in random systems, and obtain alternative growth curves. These curves fulfill universal data collapses that only depend on the value of the Zipf's exponent. We observe that real books behave very much in the same way as random systems, despite the presence of burstiness in word occurrence. We advance an explanation for this unexpected correspondence

    Languages cool as they expand: Allometric scaling and the decreasing need for new words

    Get PDF
    We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This ā€˜ā€˜cooling patternā€™ā€™ forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature

    Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems

    Get PDF
    Background: Zipf's law and Heaps' law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation. Methodology/Principal Findings: We show that the Heaps' law can be considered as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we refine the known approximate solution of the Heaps' exponent provided the Zipf's exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps' exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf's and Heaps' exponents. Conclusions/Significance: The present analysis provides a clear picture about the relation between the Zipf's law and Heaps' law without the help of any specific stochastic model, namely the Heaps' law is indeed a derivative phenomenon from Zipf's law. The presented numerical method gives considerably better estimation of the Heaps' exponent given the Zipf's exponent and the system size. Our analysis provides some insights and implications of real complex systems, for example, one can naturally obtained a better explanation of the accelerated growth of scale-free networks.Comment: 15 pages, 6 figures, 1 Tabl

    Innovation and Nested Preferential Growth in Chess Playing Behavior

    Full text link
    Complexity develops via the incorporation of innovative properties. Chess is one of the most complex strategy games, where expert contenders exercise decision making by imitating old games or introducing innovations. In this work, we study innovation in chess by analyzing how different move sequences are played at the population level. It is found that the probability of exploring a new or innovative move decreases as a power law with the frequency of the preceding move sequence. Chess players also exploit already known move sequences according to their frequencies, following a preferential growth mechanism. Furthermore, innovation in chess exhibits Heaps' law suggesting similarities with the process of vocabulary growth. We propose a robust generative mechanism based on nested Yule-Simon preferential growth processes that reproduces the empirical observations. These results, supporting the self-similar nature of innovations in chess are important in the context of decision making in a competitive scenario, and extend the scope of relevant findings recently discovered regarding the emergence of Zipf's law in chess.Comment: 8 pages, 4 figures, accepted for publication in Europhysics Letters (EPL

    Do Neural Nets Learn Statistical Laws behind Natural Language?

    Full text link
    The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.Comment: 21 pages, 11 figure

    Footprints in Local Reasoning

    Full text link
    Local reasoning about programs exploits the natural local behaviour common in programs by focussing on the footprint - that part of the resource accessed by the program. We address the problem of formally characterising and analysing the footprint notion for abstract local functions introduced by Calcagno, O Hearn and Yang. With our definition, we prove that the footprints are the only essential elements required for a complete specification of a local function. We formalise the notion of small specifications in local reasoning and show that for well-founded resource models, a smallest specification always exists that only includes the footprints, and also present results for the non-well-founded case. Finally, we use this theory of footprints to investigate the conditions under which the footprints correspond to the smallest safe states. We present a new model of RAM in which, unlike the standard model, the footprints of every program correspond to the smallest safe states, and we also identify a general condition on the primitive commands of a programming language which guarantees this property for arbitrary models.Comment: LMCS 2009 (FOSSACS 2008 special issue
    • ā€¦
    corecore