8,152 research outputs found
Optimal and fast detection of spatial clusters with scan statistics
We consider the detection of multivariate spatial clusters in the Bernoulli
model with locations, where the design distribution has weakly dependent
marginals. The locations are scanned with a rectangular window with sides
parallel to the axes and with varying sizes and aspect ratios. Multivariate
scan statistics pose a statistical problem due to the multiple testing over
many scan windows, as well as a computational problem because statistics have
to be evaluated on many windows. This paper introduces methodology that leads
to both statistically optimal inference and computationally efficient
algorithms. The main difference to the traditional calibration of scan
statistics is the concept of grouping scan windows according to their sizes,
and then applying different critical values to different groups. It is shown
that this calibration of the scan statistic results in optimal inference for
spatial clusters on both small scales and on large scales, as well as in the
case where the cluster lives on one of the marginals. Methodology is introduced
that allows for an efficient approximation of the set of all rectangles while
still guaranteeing the statistical optimality results described above. It is
shown that the resulting scan statistic has a computational complexity that is
almost linear in .Comment: Published in at http://dx.doi.org/10.1214/09-AOS732 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Continuous testing for Poisson process intensities: A new perspective on scanning statistics
We propose a novel continuous testing framework to test the intensities of
Poisson Processes. This framework allows a rigorous definition of the complete
testing procedure, from an infinite number of hypothesis to joint error rates.
Our work extends traditional procedures based on scanning windows, by
controlling the family-wise error rate and the false discovery rate in a
non-asymptotic manner and in a continuous way. The decision rule is based on a
\pvalue process that can be estimated by a Monte-Carlo procedure. We also
propose new test statistics based on kernels. Our method is applied in
Neurosciences and Genomics through the standard test of homogeneity, and the
two-sample test
Detection of an anomalous cluster in a network
We consider the problem of detecting whether or not, in a given sensor
network, there is a cluster of sensors which exhibit an "unusual behavior."
Formally, suppose we are given a set of nodes and attach a random variable to
each node. We observe a realization of this process and want to decide between
the following two hypotheses: under the null, the variables are i.i.d. standard
normal; under the alternative, there is a cluster of variables that are i.i.d.
normal with positive mean and unit variance, while the rest are i.i.d. standard
normal. We also address surveillance settings where each sensor in the network
collects information over time. The resulting model is similar, now with a time
series attached to each node. We again observe the process over time and want
to decide between the null, where all the variables are i.i.d. standard normal,
and the alternative, where there is an emerging cluster of i.i.d. normal
variables with positive mean and unit variance. The growth models used to
represent the emerging cluster are quite general and, in particular, include
cellular automata used in modeling epidemics. In both settings, we consider
classes of clusters that are quite general, for which we obtain a lower bound
on their respective minimax detection rate and show that some form of scan
statistic, by far the most popular method in practice, achieves that same rate
to within a logarithmic factor. Our results are not limited to the normal
location model, but generalize to any one-parameter exponential family when the
anomalous clusters are large enough.Comment: Published in at http://dx.doi.org/10.1214/10-AOS839 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
High-dimensional change-point detection with sparse alternatives
We consider the problem of detecting a change in mean in a sequence of
Gaussian vectors. Under the alternative hypothesis, the change occurs only in
some subset of the components of the vector. We propose a test of the presence
of a change-point that is adaptive to the number of changing components. Under
the assumption that the vector dimension tends to infinity and the length of
the sequence grows slower than the dimension of the signal, we obtain the
detection boundary for this problem and prove its rate-optimality
- …