434,567 research outputs found

    Stochastic model for the vocabulary growth in natural languages

    Full text link
    We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually infinite number of noncore-words which have lower frequency and once used reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the google-ngram database of books published in the last centuries and its main consequence is the generalization of Zipf's and Heaps' law to two scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model the main change on historical time scales is the composition of the specific words included in the finite list of core-words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.Comment: corrected typos and errors in reference list; 10 pages text, 15 pages supplemental material; to appear in Physical Review

    Power laws, Pareto distributions and Zipf's law

    Full text link
    When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them.Comment: 28 pages, 16 figures, minor corrections and additions in this versio

    Power-law distributions in empirical data

    Full text link
    Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.Comment: 43 pages, 11 figures, 7 tables, 4 appendices; code available at http://www.santafe.edu/~aaronc/powerlaws

    The position profiles of order cancellations in an emerging stock market

    Full text link
    Order submission and cancellation are two constituent actions of stock trading behaviors in order-driven markets. Order submission dynamics has been extensively studied for different markets, while order cancellation dynamics is less understood. There are two positions associated with a cancellation, that is, the price level in the limit-order book (LOB) and the position in the queue at each price level. We study the profiles of these two order cancellation positions through rebuilding the limit-order book using the order flow data of 23 liquid stocks traded on the Shenzhen Stock Exchange in the year 2003. We find that the profiles of relative price levels where cancellations occur obey a log-normal distribution. After normalizing the relative price level by removing the factor of order numbers stored at the price level, we find that the profiles exhibit a power-law scaling behavior on the right tails for both buy and sell orders. When focusing on the order cancellation positions in the queue at each price level, we find that the profiles increase rapidly in the front of the queue, and then fluctuate around a constant value till the end of the queue. These profiles are similar for different stocks. In addition, the profiles of cancellation positions can be fitted by an exponent function for both buy and sell orders. These two kinds of cancellation profiles seem universal for different stocks investigated and exhibit minor asymmetry between buy and sell orders. Our empirical findings shed new light on the order cancellation dynamics and pose constraints on the construction of order-driven stock market models.Comment: 17 pages, 6 figures and 6 table
    • …
    corecore