22,952 research outputs found

    The Hunting of the Bump: On Maximizing Statistical Discrepancy

    Full text link
    Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in O(1ϵn2log2n)O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n}) that computes the maximum discrepancy rectangle to within additive error ϵ\epsilon, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time O(n4)\smash[t]{O(n^4)}.Comment: 11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendi

    Diffusion-limited aggregation on the hyperbolic plane

    Full text link
    We consider an analogous version of the diffusion-limited aggregation model defined on the hyperbolic plane. We prove that almost surely the aggregate viewed at time infinity will have a positive density.Comment: Published at http://dx.doi.org/10.1214/14-AOP928 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Eden growth model for aggregation of charged particles

    Full text link
    The stochastic Eden model of charged particles aggregation in two-dimensional systems is presented. This model is governed by two parameters: screening length of electrostatic interaction, λ\lambda , and short range attraction energy, EE. Different patterns of finite and infinite aggregates are observed. They are of following types of morphologies: linear or linear with bending, warm-like, DBM (dense-branching morphology), DBM with nucleus, and compact Eden-like. The transition between the different modes of growth is studied and phase diagram of the growth structures is obtained in λ,E\lambda, E co-ordinates. The detailed aggregate structure analysis, including analysis of their fractal properties, is presented. The scheme of the internal inhomogeneous structure of aggregates is proposed.Comment: Revtex, 9 pages with 12 postscript figure

    Influence, originality and similarity in directed acyclic graphs

    Get PDF
    We introduce a framework for network analysis based on random walks on directed acyclic graphs where the probability of passing through a given node is the key ingredient. We illustrate its use in evaluating the mutual influence of nodes and discovering seminal papers in a citation network. We further introduce a new similarity metric and test it in a simple personalized recommendation process. This metric's performance is comparable to that of classical similarity metrics, thus further supporting the validity of our framework.Comment: 6 pages, 4 figure
    corecore