2 research outputs found
Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words
Background: Zipf's discovery that word frequency distributions obey a power
law established parallels between biological and physical processes, and
language, laying the groundwork for a complex systems perspective on human
communication. More recent research has also identified scaling regularities in
the dynamics underlying the successive occurrences of events, suggesting the
possibility of similar findings for language as well.
Methodology/Principal Findings: By considering frequent words in USENET
discussion groups and in disparate databases where the language has different
levels of formality, here we show that the distributions of distances between
successive occurrences of the same word display bursty deviations from a
Poisson process and are well characterized by a stretched exponential (Weibull)
scaling. The extent of this deviation depends strongly on semantic type -- a
measure of the logicality of each word -- and less strongly on frequency. We
develop a generative model of this behavior that fully determines the dynamics
of word usage.
Conclusions/Significance: Recurrence patterns of words are well described by
a stretched exponential distribution of recurrence times, an empirical scaling
that cannot be anticipated from Zipf's law. Because the use of words provides a
uniquely precise and powerful lens on human thought and activity, our findings
also have implications for other overt manifestations of collective human
dynamics