20,334 research outputs found

    The Hunting of the Bump: On Maximizing Statistical Discrepancy

    Full text link
    Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in O(1Ï”n2log⁥2n)O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n}) that computes the maximum discrepancy rectangle to within additive error Ï”\epsilon, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time O(n4)\smash[t]{O(n^4)}.Comment: 11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendi

    On the Catalyzing Effect of Randomness on the Per-Flow Throughput in Wireless Networks

    Get PDF
    This paper investigates the throughput capacity of a flow crossing a multi-hop wireless network, whose geometry is characterized by general randomness laws including Uniform, Poisson, Heavy-Tailed distributions for both the nodes' densities and the number of hops. The key contribution is to demonstrate \textit{how} the \textit{per-flow throughput} depends on the distribution of 1) the number of nodes NjN_j inside hops' interference sets, 2) the number of hops KK, and 3) the degree of spatial correlations. The randomness in both NjN_j's and KK is advantageous, i.e., it can yield larger scalings (as large as Θ(n)\Theta(n)) than in non-random settings. An interesting consequence is that the per-flow capacity can exhibit the opposite behavior to the network capacity, which was shown to suffer from a logarithmic decrease in the presence of randomness. In turn, spatial correlations along the end-to-end path are detrimental by a logarithmic term

    An algorithm for constrained one-step inversion of spectral CT data

    Get PDF
    We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.Comment: Submitted to Physics in Medicine and Biolog

    Revisiting Guerry's data: Introducing spatial constraints in multivariate analysis

    Full text link
    Standard multivariate analysis methods aim to identify and summarize the main structures in large data sets containing the description of a number of observations by several variables. In many cases, spatial information is also available for each observation, so that a map can be associated to the multivariate data set. Two main objectives are relevant in the analysis of spatial multivariate data: summarizing covariation structures and identifying spatial patterns. In practice, achieving both goals simultaneously is a statistical challenge, and a range of methods have been developed that offer trade-offs between these two objectives. In an applied context, this methodological question has been and remains a major issue in community ecology, where species assemblages (i.e., covariation between species abundances) are often driven by spatial processes (and thus exhibit spatial patterns). In this paper we review a variety of methods developed in community ecology to investigate multivariate spatial patterns. We present different ways of incorporating spatial constraints in multivariate analysis and illustrate these different approaches using the famous data set on moral statistics in France published by Andr\'{e}-Michel Guerry in 1833. We discuss and compare the properties of these different approaches both from a practical and theoretical viewpoint.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS356 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • 

    corecore