2,423 research outputs found

    The Hunting of the Bump: On Maximizing Statistical Discrepancy

    Full text link
    Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in O(1ϵn2log2n)O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n}) that computes the maximum discrepancy rectangle to within additive error ϵ\epsilon, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time O(n4)\smash[t]{O(n^4)}.Comment: 11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendi

    Intercalates and Discrepancy in Random Latin Squares

    Full text link
    An intercalate in a Latin square is a 2×22\times2 Latin subsquare. Let NN be the number of intercalates in a uniformly random n×nn\times n Latin square. We prove that asymptotically almost surely N(1o(1))n2/4N\ge\left(1-o\left(1\right)\right)\,n^{2}/4, and that EN(1+o(1))n2/2\mathbb{E}N\le\left(1+o\left(1\right)\right)\,n^{2}/2 (therefore asymptotically almost surely Nfn2N\le fn^{2} for any ff\to\infty). This significantly improves the previous best lower and upper bounds. We also give an upper tail bound for the number of intercalates in two fixed rows of a random Latin square. In addition, we discuss a problem of Linial and Luria on low-discrepancy Latin squares

    Lower Bounds for L1L_1 Discrepancy

    Full text link
    We find the best asymptotic lower bounds for the coefficient of the leading term of the L1L_1 norm of the two-dimensional (axis-parallel) discrepancy that can be obtained by K.Roth's orthogonal function method among a large class of test functions. We use methods of combinatorics, probability, complex and harmonic analysis.Comment: a slightly different version of the article is accepted to "Mathematika

    The Supremum Norm of the Discrepancy Function: Recent Results and Connections

    Full text link
    A great challenge in the analysis of the discrepancy function D_N is to obtain universal lower bounds on the L-infty norm of D_N in dimensions d \geq 3. It follows from the average case bound of Klaus Roth that the L-infty norm of D_N is at least (log N) ^{(d-1)/2}. It is conjectured that the L-infty bound is significantly larger, but the only definitive result is that of Wolfgang Schmidt in dimension d=2. Partial improvements of the Roth exponent (d-1)/2 in higher dimensions have been established by the authors and Armen Vagharshakyan. We survey these results, the underlying methods, and some of their connections to other subjects in probability, approximation theory, and analysis.Comment: 15 pages, 3 Figures. Reports on talks presented by the authors at the 10th international conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Sydney Australia, February 2011. v2: Comments of the referee are incorporate
    corecore