2,296 research outputs found
Estimating the historical and future probabilities of large terrorist events
Quantities with right-skewed distributions are ubiquitous in complex social
systems, including political conflict, economics and social networks, and these
systems sometimes produce extremely large events. For instance, the 9/11
terrorist events produced nearly 3000 fatalities, nearly six times more than
the next largest event. But, was this enormous loss of life statistically
unlikely given modern terrorism's historical record? Accurately estimating the
probability of such an event is complicated by the large fluctuations in the
empirical distribution's upper tail. We present a generic statistical algorithm
for making such estimates, which combines semi-parametric models of tail
behavior and a nonparametric bootstrap. Applied to a global database of
terrorist events, we estimate the worldwide historical probability of observing
at least one 9/11-sized or larger event since 1968 to be 11-35%. These results
are robust to conditioning on global variations in economic development,
domestic versus international events, the type of weapon used and a truncated
history that stops at 1998. We then use this procedure to make a data-driven
statistical forecast of at least one similar event over the next decade.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS614 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Scoring dynamics across professional team sports: tempo, balance and predictability
Despite growing interest in quantifying and modeling the scoring dynamics
within professional sports games, relative little is known about what patterns
or principles, if any, cut across different sports. Using a comprehensive data
set of scoring events in nearly a dozen consecutive seasons of college and
professional (American) football, professional hockey, and professional
basketball, we identify several common patterns in scoring dynamics. Across
these sports, scoring tempo---when scoring events occur---closely follows a
common Poisson process, with a sport-specific rate. Similarly, scoring
balance---how often a team wins an event---follows a common Bernoulli process,
with a parameter that effectively varies with the size of the lead. Combining
these processes within a generative model of gameplay, we find they both
reproduce the observed dynamics in all four sports and accurately predict game
outcomes. These results demonstrate common dynamical patterns underlying
within-game scoring dynamics across professional team sports, and suggest
specific mechanisms for driving them. We close with a brief discussion of the
implications of our results for several popular hypotheses about sports
dynamics.Comment: 18 pages, 8 figures, 4 tables, 2 appendice
Power-law distributions in binned empirical data
Many man-made and natural phenomena, including the intensity of earthquakes,
population of cities and size of international wars, are believed to follow
power-law distributions. The accurate identification of power-law patterns has
significant consequences for correctly understanding and modeling complex
systems. However, statistical evidence for or against the power-law hypothesis
is complicated by large fluctuations in the empirical distribution's tail, and
these are worsened when information is lost from binning the data. We adapt the
statistically principled framework for testing the power-law hypothesis,
developed by Clauset, Shalizi and Newman, to the case of binned data. This
approach includes maximum-likelihood fitting, a hypothesis test based on the
Kolmogorov--Smirnov goodness-of-fit statistic and likelihood ratio tests for
comparing against alternative explanations. We evaluate the effectiveness of
these methods on synthetic binned data with known structure, quantify the loss
of statistical power due to binning, and apply the methods to twelve real-world
binned data sets with heavy-tailed patterns.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS710 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Detecting change points in the large-scale structure of evolving networks
Interactions among people or objects are often dynamic in nature and can be
represented as a sequence of networks, each providing a snapshot of the
interactions over a brief period of time. An important task in analyzing such
evolving networks is change-point detection, in which we both identify the
times at which the large-scale pattern of interactions changes fundamentally
and quantify how large and what kind of change occurred. Here, we formalize for
the first time the network change-point detection problem within an online
probabilistic learning framework and introduce a method that can reliably solve
it. This method combines a generalized hierarchical random graph model with a
Bayesian hypothesis test to quantitatively determine if, when, and precisely
how a change point has occurred. We analyze the detectability of our method
using synthetic data with known change points of different types and
magnitudes, and show that this method is more accurate than several previously
used alternatives. Applied to two high-resolution evolving social networks,
this method identifies a sequence of change points that align with known
external "shocks" to these networks
- …