Search CORE

2,423 research outputs found

The Hunting of the Bump: On Maximizing Statistical Discrepancy

Author: Agarwal Deepak
Phillips Jeff M.
Venkatasubramanian Suresh
Publication venue
Publication date: 02/10/2005
Field of study

Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in

O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n})

that computes the maximum discrepancy rectangle to within additive error

\epsilon

, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time

\smash[t]{O(n^4)}

.Comment: 11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendi

arXiv.org e-Print Archive

CiteSeerX

Intercalates and Discrepancy in Random Latin Squares

Author: Bartlett
Browning
Brégman
Cameron
Cavenagh
Egorychev
Erdős
Falikman
Heinrich
Häggkvist
Jacobson
Janson
Keedwell
Kotzig
Kotzig
Krivelevich
Kwan
Liebenau
Linial
McKay
McLeish
Ordentlich
Pittenger
Publication venue
Publication date: 17/01/2017
Field of study

An intercalate in a Latin square is a

2\times2

Latin subsquare. Let

N

be the number of intercalates in a uniformly random

n\times n

Latin square. We prove that asymptotically almost surely

N\ge\left(1-o\left(1\right)\right)\,n^{2}/4

, and that

\mathbb{E}N\le\left(1+o\left(1\right)\right)\,n^{2}/2

(therefore asymptotically almost surely

N\le fn^{2}

for any

f\to\infty

). This significantly improves the previous best lower and upper bounds. We also give an upper tail bound for the number of intercalates in two fixed rows of a random Latin square. In addition, we discuss a problem of Linial and Luria on low-discrepancy Latin squares

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Lower Bounds for $L_1$ Discrepancy

Author: Armen Vagharshakyan
Halasz
Katznelson
Macdonald
Matoušek
Publication venue: 'Wiley'
Publication date: 05/11/2012
Field of study

We find the best asymptotic lower bounds for the coefficient of the leading term of the

L_1

norm of the two-dimensional (axis-parallel) discrepancy that can be obtained by K.Roth's orthogonal function method among a large class of test functions. We use methods of combinatorics, probability, complex and harmonic analysis.Comment: a slightly different version of the article is accepted to "Mathematika

arXiv.org e-Print Archive

Crossref

The Supremum Norm of the Discrepancy Function: Recent Results and Connections

Author: A. Zygmund
D. Bilyk
D. Bilyk
D. Bilyk
D. Bilyk
F. Riesz
J. Beck
J. Beck
J. Dick
J. Kuelbs
J. Kuelbs
J. Matoušek
K.F. Roth
M. Drmota
M. Lacey
M. Talagrand
R. Fefferman
S. Sidon
S. Sidon
S.-Y.A. Chang
T. Dunker
V.N. Temlyakov
W.M. Schmidt
W.M. Schmidt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2012
Field of study

A great challenge in the analysis of the discrepancy function D_N is to obtain universal lower bounds on the L-infty norm of D_N in dimensions d \geq 3. It follows from the average case bound of Klaus Roth that the L-infty norm of D_N is at least (log N) ^{(d-1)/2}. It is conjectured that the L-infty bound is significantly larger, but the only definitive result is that of Wolfgang Schmidt in dimension d=2. Partial improvements of the Roth exponent (d-1)/2 in higher dimensions have been established by the authors and Armen Vagharshakyan. We survey these results, the underlying methods, and some of their connections to other subjects in probability, approximation theory, and analysis.Comment: 15 pages, 3 Figures. Reports on talks presented by the authors at the 10th international conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Sydney Australia, February 2011. v2: Comments of the referee are incorporate

arXiv.org e-Print Archive

Crossref