2,423 research outputs found
The Hunting of the Bump: On Maximizing Statistical Discrepancy
Anomaly detection has important applications in biosurveilance and
environmental monitoring. When comparing measured data to data drawn from a
baseline distribution, merely, finding clusters in the measured data may not
actually represent true anomalies. These clusters may likely be the clusters of
the baseline distribution. Hence, a discrepancy function is often used to
examine how different measured data is to baseline data within a region. An
anomalous region is thus defined to be one with high discrepancy.
In this paper, we present algorithms for maximizing statistical discrepancy
functions over the space of axis-parallel rectangles. We give provable
approximation guarantees, both additive and relative, and our methods apply to
any convex discrepancy function. Our algorithms work by connecting statistical
discrepancy to combinatorial discrepancy; roughly speaking, we show that in
order to maximize a convex discrepancy function over a class of shapes, one
needs only maximize a linear discrepancy function over the same set of shapes.
We derive general discrepancy functions for data generated from a one-
parameter exponential family. This generalizes the widely-used Kulldorff scan
statistic for data from a Poisson distribution. We present an algorithm running
in that computes the maximum
discrepancy rectangle to within additive error , for the Kulldorff
scan statistic. Similar results hold for relative error and for discrepancy
functions for data coming from Gaussian, Bernoulli, and gamma distributions.
Prior to our work, the best known algorithms were exact and ran in time
.Comment: 11 pages. A short version of this paper will appear in SODA06. This
full version contains an additional short appendi
Intercalates and Discrepancy in Random Latin Squares
An intercalate in a Latin square is a Latin subsquare. Let be
the number of intercalates in a uniformly random Latin square. We
prove that asymptotically almost surely
, and that
(therefore
asymptotically almost surely for any ). This
significantly improves the previous best lower and upper bounds. We also give
an upper tail bound for the number of intercalates in two fixed rows of a
random Latin square. In addition, we discuss a problem of Linial and Luria on
low-discrepancy Latin squares
Lower Bounds for Discrepancy
We find the best asymptotic lower bounds for the coefficient of the leading
term of the norm of the two-dimensional (axis-parallel) discrepancy that
can be obtained by K.Roth's orthogonal function method among a large class of
test functions. We use methods of combinatorics, probability, complex and
harmonic analysis.Comment: a slightly different version of the article is accepted to
"Mathematika
The Supremum Norm of the Discrepancy Function: Recent Results and Connections
A great challenge in the analysis of the discrepancy function D_N is to
obtain universal lower bounds on the L-infty norm of D_N in dimensions d \geq
3. It follows from the average case bound of Klaus Roth that the L-infty norm
of D_N is at least (log N) ^{(d-1)/2}. It is conjectured that the L-infty bound
is significantly larger, but the only definitive result is that of Wolfgang
Schmidt in dimension d=2. Partial improvements of the Roth exponent (d-1)/2 in
higher dimensions have been established by the authors and Armen Vagharshakyan.
We survey these results, the underlying methods, and some of their connections
to other subjects in probability, approximation theory, and analysis.Comment: 15 pages, 3 Figures. Reports on talks presented by the authors at the
10th international conference on Monte Carlo and Quasi-Monte Carlo Methods in
Scientific Computing, Sydney Australia, February 2011. v2: Comments of the
referee are incorporate
- …