3,026 research outputs found
Understanding the Heavy Tailed Dynamics in Human Behavior
The recent availability of electronic datasets containing large volumes of
communication data has made it possible to study human behavior on a larger
scale than ever before. From this, it has been discovered that across a diverse
range of data sets, the inter-event times between consecutive communication
events obey heavy tailed power law dynamics. Explaining this has proved
controversial, and two distinct hypotheses have emerged. The first holds that
these power laws are fundamental, and arise from the mechanisms such as
priority queuing that humans use to schedule tasks. The second holds that they
are a statistical artifact which only occur in aggregated data when features
such as circadian rhythms and burstiness are ignored. We use a large social
media data set to test these hypotheses, and find that although models that
incorporate circadian rhythms and burstiness do explain part of the observed
heavy tails, there is residual unexplained heavy tail behavior which suggests a
more fundamental cause. Based on this, we develop a new quantitative model of
human behavior which improves on existing approaches, and gives insight into
the mechanisms underlying human interactions.Comment: 9 pages in Physical Review E, 201
Detecting changes in high frequency data streams, with applications
In recent years, problems relating to the analysis of data streams have become
widespread. A data stream is a collection of time ordered observations x1, x2, ...
generated from the random variables X1, X2, .... It is assumed that the observations
are univariate and independent, and that they arrive in discrete time.
Unlike traditional sequential analysis problems considered by statisticians, the
size of a data stream is not assumed to be fixed, and new observations may be
received over time. The rate at which these observations are received can be very
high, perhaps several thousand every second. Therefore computational efficiency is
very important, and methods used for analysis must be able to cope with potentially
huge data sets.
This paper is concerned with the task of detecting whether a data stream contains
a change point, and extends traditional methods for sequential change detection
to the streaming context. We focus on two different settings of the change
point problem. The first is nonparametric change detection where, in contrast to
most of the existing literature, we assume that nothing is known about either the
pre- or post-change stream distribution. The task is then to detect a change from
an unknown base distribution F0 to an unknown distribution F1. Further, we impose
the constraint that change detection methods must have a bounded rate of false
positives, which is important when it comes to assessing the significance of discovered
change points. It is this constraint which makes the nonparametric problem
difficult. We present several novel methods for this problem, and compare their
performance via extensive experimental analysis.
The second strand of our research is Bernoulli change detection, with application to streaming classification. In this setting, we assume a parametric form for
the stream distribution, but one where both the pre- and post-change parameters
are unknown. The task is again to detect changes, while having a control on the
rate of false positives. After developing two different methods for tackling the pure
Bernoulli change detection task, we then show how our approach can be deployed
in streaming classification applications. Here, the goal is to classify objects into
one of several categories. In the streaming case, the optimal classification rule can
change over time, and classification techniques which are not able to adapt to these
changes will suffer performance degradation. We show that by focusing only on
the frequency of errors produced by the classifier, we can treat this as a Bernoulli
change detection problem, and again perform extensive experimental analysis to
show the value of our methods
- …