1,958 research outputs found
DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN
Recently, the introduction of the generative adversarial network (GAN) and
its variants has enabled the generation of realistic synthetic samples, which
has been used for enlarging training sets. Previous work primarily focused on
data augmentation for semi-supervised and supervised tasks. In this paper, we
instead focus on unsupervised anomaly detection and propose a novel generative
data augmentation framework optimized for this task. In particular, we propose
to oversample infrequent normal samples - normal samples that occur with small
probability, e.g., rare normal events. We show that these samples are
responsible for false positives in anomaly detection. However, oversampling of
infrequent normal samples is challenging for real-world high-dimensional data
with multimodal distributions. To address this challenge, we propose to use a
GAN variant known as the adversarial autoencoder (AAE) to transform the
high-dimensional multimodal data distributions into low-dimensional unimodal
latent distributions with well-defined tail probability. Then, we
systematically oversample at the `edge' of the latent distributions to increase
the density of infrequent normal samples. We show that our oversampling
pipeline is a unified one: it is generally applicable to datasets with
different complex data distributions. To the best of our knowledge, our method
is the first data augmentation technique focused on improving performance in
unsupervised anomaly detection. We validate our method by demonstrating
consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International
Conference on Data Mining
Adversarial Feature Learning of Online Monitoring Data for Operational Risk Assessment in Distribution Networks
With the deployment of online monitoring systems in distribution networks,
massive amounts of data collected through them contains rich information on the
operating states of the networks. By leveraging the data, an unsupervised
approach based on bidirectional generative adversarial networks (BiGANs) is
proposed for operational risk assessment in distribution networks in this
paper. The approach includes two stages: (1) adversarial feature learning. The
most representative features are extracted from the online monitoring data and
a statistical index is calculated for the features, during
which we make no assumptions or simplifications on the real data. (2)
operational risk assessment. The confidence level for the population
mean of the standardized is combined with the operational
risk levels which are divided into emergency, high risk, preventive and normal,
and the p value for each data point is calculated and compared with
to determine the risk levels. The proposed approach is
capable of discovering the latent structure of the real data and providing more
accurate assessment result. The synthetic data is employed to illustrate the
selection of parameters involved in the proposed approach. Case studies on the
real-world online monitoring data validate the effectiveness and advantages of
the proposed approach in risk assessment.Comment: 10 pages, IEEE Trans on Power Systems, submitte
Anomaly scores for generative models
Reconstruction error is a prevalent score used to identify anomalous samples
when data are modeled by generative models, such as (variational) auto-encoders
or generative adversarial networks. This score relies on the assumption that
normal samples are located on a manifold and all anomalous samples are located
outside. Since the manifold can be learned only where the training data lie,
there are no guarantees how the reconstruction error behaves elsewhere and the
score, therefore, seems to be ill-defined. This work defines an anomaly score
that is theoretically compatible with generative models, and very natural for
(variational) auto-encoders as they seem to be prevalent. The new score can be
also used to select hyper-parameters and models. Finally, we explain why
reconstruction error delivers good experimental results despite weak
theoretical justification.Comment: 9 pages, 3 figures, submitted to NeurIPS 201
SAIFE: Unsupervised Wireless Spectrum Anomaly Detection with Interpretable Features
Detecting anomalous behavior in wireless spectrum is a demanding task due to
the sheer complexity of the electromagnetic spectrum use. Wireless spectrum
anomalies can take a wide range of forms from the presence of an unwanted
signal in a licensed band to the absence of an expected signal, which makes
manual labeling of anomalies difficult and suboptimal. We present, Spectrum
Anomaly Detector with Interpretable FEatures (SAIFE), an Adversarial
Autoencoder (AAE) based anomaly detector for wireless spectrum anomaly
detection using Power Spectral Density (PSD) data which achieves good anomaly
detection and localization in an unsupervised setting. In addition, we
investigate the model's capabilities to learn interpretable features such as
signal bandwidth, class and center frequency in a semi-supervised fashion.
Along with anomaly detection the model exhibits promising results for lossy PSD
data compression up to 120X and semisupervised signal classification accuracy
close to 100% on three datasets just using 20% labeled samples. Finally the
model is tested on data from one of the distributed Electrosense sensors over a
long term of 500 hours showing its anomaly detection capabilities.Comment: Copyright IEEE, Accepted for DySPAN 201
Adversarially Learned Anomaly Detection on CMS Open Data: re-discovering the top quark
We apply an Adversarially Learned Anomaly Detection (ALAD) algorithm to the
problem of detecting new physics processes in proton-proton collisions at the
Large Hadron Collider. Anomaly detection based on ALAD matches performances
reached by Variational Autoencoders, with a substantial improvement in some
cases. Training the ALAD algorithm on 4.4 fb-1 of 8 TeV CMS Open Data, we show
how a data-driven anomaly detection and characterization would work in real
life, re-discovering the top quark by identifying the main features of the
t-tbar experimental signature at the LHC.Comment: 16 pages, 9 figure
WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Machine learning models encounter Out-of-Distribution (OoD) errors when the
data seen at test time are generated from a different stochastic generator than
the one used to generate the training data. One proposal to scale OoD detection
to high-dimensional data is to learn a tractable likelihood approximation of
the training distribution, and use it to reject unlikely inputs. However,
likelihood models on natural data are themselves susceptible to OoD errors, and
even assign large likelihoods to samples from other datasets. To mitigate this
problem, we propose Generative Ensembles, which robustify density-based OoD
detection by way of estimating epistemic uncertainty of the likelihood model.
We present a puzzling observation in need of an explanation -- although
likelihood measures cannot account for the typical set of a distribution, and
therefore should not be suitable on their own for OoD detection, WAIC performs
surprisingly well in practice
Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery
Obtaining models that capture imaging markers relevant for disease
progression and treatment monitoring is challenging. Models are typically based
on large amounts of data with annotated examples of known markers aiming at
automating detection. High annotation effort and the limitation to a vocabulary
of known markers limit the power of such approaches. Here, we perform
unsupervised learning to identify anomalies in imaging data as candidates for
markers. We propose AnoGAN, a deep convolutional generative adversarial network
to learn a manifold of normal anatomical variability, accompanying a novel
anomaly scoring scheme based on the mapping from image space to a latent space.
Applied to new data, the model labels anomalies, and scores image patches
indicating their fit into the learned distribution. Results on optical
coherence tomography images of the retina demonstrate that the approach
correctly identifies anomalous images, such as images containing retinal fluid
or hyperreflective foci.Comment: To be published in the proceedings of the international conference on
Information Processing in Medical Imaging (IPMI), 201
Fence GAN: Towards Better Anomaly Detection
Anomaly detection is a classical problem where the aim is to detect anomalous
data that do not belong to the normal data distribution. Current
state-of-the-art methods for anomaly detection on complex high-dimensional data
are based on the generative adversarial network (GAN). However, the traditional
GAN loss is not directly aligned with the anomaly detection objective: it
encourages the distribution of the generated samples to overlap with the real
data and so the resulting discriminator has been found to be ineffective as an
anomaly detector. In this paper, we propose simple modifications to the GAN
loss such that the generated samples lie at the boundary of the real data
distribution. With our modified GAN loss, our anomaly detection method, called
Fence GAN (FGAN), directly uses the discriminator score as an anomaly
threshold. Our experimental results using the MNIST, CIFAR10 and KDD99 datasets
show that Fence GAN yields the best anomaly classification accuracy compared to
state-of-the-art methods
Adversarially Learned One-Class Classifier for Novelty Detection
Novelty detection is the process of identifying the observation(s) that
differ in some respect from the training observations (the target class). In
reality, the novelty class is often absent during training, poorly sampled or
not well defined. Therefore, one-class classifiers can efficiently model such
problems. However, due to the unavailability of data from the novelty class,
training an end-to-end deep network is a cumbersome task. In this paper,
inspired by the success of generative adversarial networks for training deep
models in unsupervised and semi-supervised settings, we propose an end-to-end
architecture for one-class classification. Our architecture is composed of two
deep networks, each of which trained by competing with each other while
collaborating to understand the underlying concept in the target class, and
then classify the testing samples. One network works as the novelty detector,
while the other supports it by enhancing the inlier samples and distorting the
outliers. The intuition is that the separability of the enhanced inliers and
distorted outliers is much better than deciding on the original samples. The
proposed framework applies to different related applications of anomaly and
outlier detection in images and videos. The results on MNIST and Caltech-256
image datasets, along with the challenging UCSD Ped2 dataset for video anomaly
detection illustrate that our proposed method learns the target class
effectively and is superior to the baseline and state-of-the-art methods.Comment: CVPR 2018 Pape
Classification-Based Anomaly Detection for General Data
Anomaly detection, finding patterns that substantially deviate from those
seen previously, is one of the fundamental problems of artificial intelligence.
Recently, classification-based methods were shown to achieve superior results
on this task. In this work, we present a unifying view and propose an open-set
method, GOAD, to relax current generalization assumptions. Furthermore, we
extend the applicability of transformation-based methods to non-image data
using random affine transformations. Our method is shown to obtain
state-of-the-art accuracy and is applicable to broad data types. The strong
performance of our method is extensively validated on multiple datasets from
different domains.Comment: ICLR'2
- …