434,567 research outputs found
Stochastic model for the vocabulary growth in natural languages
We propose a stochastic model for the number of different words in a given
database which incorporates the dependence on the database size and historical
changes. The main feature of our model is the existence of two different
classes of words: (i) a finite number of core-words which have higher frequency
and do not affect the probability of a new word to be used; and (ii) the
remaining virtually infinite number of noncore-words which have lower frequency
and once used reduce the probability of a new word to be used in the future.
Our model relies on a careful analysis of the google-ngram database of books
published in the last centuries and its main consequence is the generalization
of Zipf's and Heaps' law to two scaling regimes. We confirm that these
generalizations yield the best simple description of the data among generic
descriptive models and that the two free parameters depend only on the language
but not on the database. From the point of view of our model the main change on
historical time scales is the composition of the specific words included in the
finite list of core-words, which we observe to decay exponentially in time with
a rate of approximately 30 words per year for English.Comment: corrected typos and errors in reference list; 10 pages text, 15 pages
supplemental material; to appear in Physical Review
Power laws, Pareto distributions and Zipf's law
When the probability of measuring a particular value of some quantity varies
inversely as a power of that value, the quantity is said to follow a power law,
also known variously as Zipf's law or the Pareto distribution. Power laws
appear widely in physics, biology, earth and planetary sciences, economics and
finance, computer science, demography and the social sciences. For instance,
the distributions of the sizes of cities, earthquakes, solar flares, moon
craters, wars and people's personal fortunes all appear to follow power laws.
The origin of power-law behaviour has been a topic of debate in the scientific
community for more than a century. Here we review some of the empirical
evidence for the existence of power-law forms and the theories proposed to
explain them.Comment: 28 pages, 16 figures, minor corrections and additions in this versio
Power-law distributions in empirical data
Power-law distributions occur in many situations of scientific interest and
have significant consequences for our understanding of natural and man-made
phenomena. Unfortunately, the detection and characterization of power laws is
complicated by the large fluctuations that occur in the tail of the
distribution -- the part of the distribution representing large but rare events
-- and by the difficulty of identifying the range over which power-law behavior
holds. Commonly used methods for analyzing power-law data, such as
least-squares fitting, can produce substantially inaccurate estimates of
parameters for power-law distributions, and even in cases where such methods
return accurate answers they are still unsatisfactory because they give no
indication of whether the data obey a power law at all. Here we present a
principled statistical framework for discerning and quantifying power-law
behavior in empirical data. Our approach combines maximum-likelihood fitting
methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic
and likelihood ratios. We evaluate the effectiveness of the approach with tests
on synthetic data and give critical comparisons to previous approaches. We also
apply the proposed methods to twenty-four real-world data sets from a range of
different disciplines, each of which has been conjectured to follow a power-law
distribution. In some cases we find these conjectures to be consistent with the
data while in others the power law is ruled out.Comment: 43 pages, 11 figures, 7 tables, 4 appendices; code available at
http://www.santafe.edu/~aaronc/powerlaws
The position profiles of order cancellations in an emerging stock market
Order submission and cancellation are two constituent actions of stock
trading behaviors in order-driven markets. Order submission dynamics has been
extensively studied for different markets, while order cancellation dynamics is
less understood. There are two positions associated with a cancellation, that
is, the price level in the limit-order book (LOB) and the position in the queue
at each price level. We study the profiles of these two order cancellation
positions through rebuilding the limit-order book using the order flow data of
23 liquid stocks traded on the Shenzhen Stock Exchange in the year 2003. We
find that the profiles of relative price levels where cancellations occur obey
a log-normal distribution. After normalizing the relative price level by
removing the factor of order numbers stored at the price level, we find that
the profiles exhibit a power-law scaling behavior on the right tails for both
buy and sell orders. When focusing on the order cancellation positions in the
queue at each price level, we find that the profiles increase rapidly in the
front of the queue, and then fluctuate around a constant value till the end of
the queue. These profiles are similar for different stocks. In addition, the
profiles of cancellation positions can be fitted by an exponent function for
both buy and sell orders. These two kinds of cancellation profiles seem
universal for different stocks investigated and exhibit minor asymmetry between
buy and sell orders. Our empirical findings shed new light on the order
cancellation dynamics and pose constraints on the construction of order-driven
stock market models.Comment: 17 pages, 6 figures and 6 table
- …