248 research outputs found

    Online Nonparametric Anomaly Detection based on Geometric Entropy Minimization

    Full text link
    We consider the online and nonparametric detection of abrupt and persistent anomalies, such as a change in the regular system dynamics at a time instance due to an anomalous event (e.g., a failure, a malicious activity). Combining the simplicity of the nonparametric Geometric Entropy Minimization (GEM) method with the timely detection capability of the Cumulative Sum (CUSUM) algorithm we propose a computationally efficient online anomaly detection method that is applicable to high-dimensional datasets, and at the same time achieve a near-optimum average detection delay performance for a given false alarm constraint. We provide new insights to both GEM and CUSUM, including new asymptotic analysis for GEM, which enables soft decisions for outlier detection, and a novel interpretation of CUSUM in terms of the discrepancy theory, which helps us generalize it to the nonparametric GEM statistic. We numerically show, using both simulated and real datasets, that the proposed nonparametric algorithm attains a close performance to the clairvoyant parametric CUSUM test.Comment: to appear in IEEE International Symposium on Information Theory (ISIT) 201

    Contamination Estimation via Convex Relaxations

    Full text link
    Identifying anomalies and contamination in datasets is important in a wide variety of settings. In this paper, we describe a new technique for estimating contamination in large, discrete valued datasets. Our approach considers the normal condition of the data to be specified by a model consisting of a set of distributions. Our key contribution is in our approach to contamination estimation. Specifically, we develop a technique that identifies the minimum number of data points that must be discarded (i.e., the level of contamination) from an empirical data set in order to match the model to within a specified goodness-of-fit, controlled by a p-value. Appealing to results from large deviations theory, we show a lower bound on the level of contamination is obtained by solving a series of convex programs. Theoretical results guarantee the bound converges at a rate of O(log⁑(p)/p)O(\sqrt{\log(p)/p}), where p is the size of the empirical data set.Comment: To appear, ISIT 201

    Learning to classify with possible sensor failures

    Full text link
    In this paper, we propose an efficient algorithm to train a robust large-margin classifier, when corrupt measurements caused by sensor failure might be present in the training set. By incorporating a non-parametric prior based on the empiri-cal distribution of the training data, we propose a Geometric-Entropy-Minimization regularized Maximum Entropy Dis-crimination (GEM-MED) method to perform classification and anomaly detection in a joint manner. We demonstrate that our proposed method can yield improved performance over previous robust classification methods in terms of both classification accuracy and anomaly detection rate using sim-ulated data and real footstep data. Index Terms β€” corrupt measurements, robust large-margin training, anomaly detection, maximum entropy dis-crimination 1
    • …
    corecore