1,958 research outputs found

    DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

    Full text link
    Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge' of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International Conference on Data Mining

    Adversarial Feature Learning of Online Monitoring Data for Operational Risk Assessment in Distribution Networks

    Full text link
    With the deployment of online monitoring systems in distribution networks, massive amounts of data collected through them contains rich information on the operating states of the networks. By leveraging the data, an unsupervised approach based on bidirectional generative adversarial networks (BiGANs) is proposed for operational risk assessment in distribution networks in this paper. The approach includes two stages: (1) adversarial feature learning. The most representative features are extracted from the online monitoring data and a statistical index Nϕ\mathcal{N}_{\phi} is calculated for the features, during which we make no assumptions or simplifications on the real data. (2) operational risk assessment. The confidence level 1−α1-\alpha for the population mean of the standardized Nϕ\mathcal{N}_{\phi} is combined with the operational risk levels which are divided into emergency, high risk, preventive and normal, and the p value for each data point is calculated and compared with α2\frac{\alpha}{2} to determine the risk levels. The proposed approach is capable of discovering the latent structure of the real data and providing more accurate assessment result. The synthetic data is employed to illustrate the selection of parameters involved in the proposed approach. Case studies on the real-world online monitoring data validate the effectiveness and advantages of the proposed approach in risk assessment.Comment: 10 pages, IEEE Trans on Power Systems, submitte

    Anomaly scores for generative models

    Full text link
    Reconstruction error is a prevalent score used to identify anomalous samples when data are modeled by generative models, such as (variational) auto-encoders or generative adversarial networks. This score relies on the assumption that normal samples are located on a manifold and all anomalous samples are located outside. Since the manifold can be learned only where the training data lie, there are no guarantees how the reconstruction error behaves elsewhere and the score, therefore, seems to be ill-defined. This work defines an anomaly score that is theoretically compatible with generative models, and very natural for (variational) auto-encoders as they seem to be prevalent. The new score can be also used to select hyper-parameters and models. Finally, we explain why reconstruction error delivers good experimental results despite weak theoretical justification.Comment: 9 pages, 3 figures, submitted to NeurIPS 201

    SAIFE: Unsupervised Wireless Spectrum Anomaly Detection with Interpretable Features

    Full text link
    Detecting anomalous behavior in wireless spectrum is a demanding task due to the sheer complexity of the electromagnetic spectrum use. Wireless spectrum anomalies can take a wide range of forms from the presence of an unwanted signal in a licensed band to the absence of an expected signal, which makes manual labeling of anomalies difficult and suboptimal. We present, Spectrum Anomaly Detector with Interpretable FEatures (SAIFE), an Adversarial Autoencoder (AAE) based anomaly detector for wireless spectrum anomaly detection using Power Spectral Density (PSD) data which achieves good anomaly detection and localization in an unsupervised setting. In addition, we investigate the model's capabilities to learn interpretable features such as signal bandwidth, class and center frequency in a semi-supervised fashion. Along with anomaly detection the model exhibits promising results for lossy PSD data compression up to 120X and semisupervised signal classification accuracy close to 100% on three datasets just using 20% labeled samples. Finally the model is tested on data from one of the distributed Electrosense sensors over a long term of 500 hours showing its anomaly detection capabilities.Comment: Copyright IEEE, Accepted for DySPAN 201

    Adversarially Learned Anomaly Detection on CMS Open Data: re-discovering the top quark

    Full text link
    We apply an Adversarially Learned Anomaly Detection (ALAD) algorithm to the problem of detecting new physics processes in proton-proton collisions at the Large Hadron Collider. Anomaly detection based on ALAD matches performances reached by Variational Autoencoders, with a substantial improvement in some cases. Training the ALAD algorithm on 4.4 fb-1 of 8 TeV CMS Open Data, we show how a data-driven anomaly detection and characterization would work in real life, re-discovering the top quark by identifying the main features of the t-tbar experimental signature at the LHC.Comment: 16 pages, 9 figure

    WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

    Full text link
    Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data. One proposal to scale OoD detection to high-dimensional data is to learn a tractable likelihood approximation of the training distribution, and use it to reject unlikely inputs. However, likelihood models on natural data are themselves susceptible to OoD errors, and even assign large likelihoods to samples from other datasets. To mitigate this problem, we propose Generative Ensembles, which robustify density-based OoD detection by way of estimating epistemic uncertainty of the likelihood model. We present a puzzling observation in need of an explanation -- although likelihood measures cannot account for the typical set of a distribution, and therefore should not be suitable on their own for OoD detection, WAIC performs surprisingly well in practice

    Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery

    Full text link
    Obtaining models that capture imaging markers relevant for disease progression and treatment monitoring is challenging. Models are typically based on large amounts of data with annotated examples of known markers aiming at automating detection. High annotation effort and the limitation to a vocabulary of known markers limit the power of such approaches. Here, we perform unsupervised learning to identify anomalies in imaging data as candidates for markers. We propose AnoGAN, a deep convolutional generative adversarial network to learn a manifold of normal anatomical variability, accompanying a novel anomaly scoring scheme based on the mapping from image space to a latent space. Applied to new data, the model labels anomalies, and scores image patches indicating their fit into the learned distribution. Results on optical coherence tomography images of the retina demonstrate that the approach correctly identifies anomalous images, such as images containing retinal fluid or hyperreflective foci.Comment: To be published in the proceedings of the international conference on Information Processing in Medical Imaging (IPMI), 201

    Fence GAN: Towards Better Anomaly Detection

    Full text link
    Anomaly detection is a classical problem where the aim is to detect anomalous data that do not belong to the normal data distribution. Current state-of-the-art methods for anomaly detection on complex high-dimensional data are based on the generative adversarial network (GAN). However, the traditional GAN loss is not directly aligned with the anomaly detection objective: it encourages the distribution of the generated samples to overlap with the real data and so the resulting discriminator has been found to be ineffective as an anomaly detector. In this paper, we propose simple modifications to the GAN loss such that the generated samples lie at the boundary of the real data distribution. With our modified GAN loss, our anomaly detection method, called Fence GAN (FGAN), directly uses the discriminator score as an anomaly threshold. Our experimental results using the MNIST, CIFAR10 and KDD99 datasets show that Fence GAN yields the best anomaly classification accuracy compared to state-of-the-art methods

    Adversarially Learned One-Class Classifier for Novelty Detection

    Full text link
    Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to-end deep network is a cumbersome task. In this paper, inspired by the success of generative adversarial networks for training deep models in unsupervised and semi-supervised settings, we propose an end-to-end architecture for one-class classification. Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples. The proposed framework applies to different related applications of anomaly and outlier detection in images and videos. The results on MNIST and Caltech-256 image datasets, along with the challenging UCSD Ped2 dataset for video anomaly detection illustrate that our proposed method learns the target class effectively and is superior to the baseline and state-of-the-art methods.Comment: CVPR 2018 Pape

    Classification-Based Anomaly Detection for General Data

    Full text link
    Anomaly detection, finding patterns that substantially deviate from those seen previously, is one of the fundamental problems of artificial intelligence. Recently, classification-based methods were shown to achieve superior results on this task. In this work, we present a unifying view and propose an open-set method, GOAD, to relax current generalization assumptions. Furthermore, we extend the applicability of transformation-based methods to non-image data using random affine transformations. Our method is shown to obtain state-of-the-art accuracy and is applicable to broad data types. The strong performance of our method is extensively validated on multiple datasets from different domains.Comment: ICLR'2
    • …
    corecore