815 research outputs found
DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN
Recently, the introduction of the generative adversarial network (GAN) and
its variants has enabled the generation of realistic synthetic samples, which
has been used for enlarging training sets. Previous work primarily focused on
data augmentation for semi-supervised and supervised tasks. In this paper, we
instead focus on unsupervised anomaly detection and propose a novel generative
data augmentation framework optimized for this task. In particular, we propose
to oversample infrequent normal samples - normal samples that occur with small
probability, e.g., rare normal events. We show that these samples are
responsible for false positives in anomaly detection. However, oversampling of
infrequent normal samples is challenging for real-world high-dimensional data
with multimodal distributions. To address this challenge, we propose to use a
GAN variant known as the adversarial autoencoder (AAE) to transform the
high-dimensional multimodal data distributions into low-dimensional unimodal
latent distributions with well-defined tail probability. Then, we
systematically oversample at the `edge' of the latent distributions to increase
the density of infrequent normal samples. We show that our oversampling
pipeline is a unified one: it is generally applicable to datasets with
different complex data distributions. To the best of our knowledge, our method
is the first data augmentation technique focused on improving performance in
unsupervised anomaly detection. We validate our method by demonstrating
consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International
Conference on Data Mining
Anomaly Detection Approaches for Semiconductor Manufacturing
Abstract Smart production monitoring is a crucial activity in advanced manufacturing for quality, control and maintenance purposes. Advanced Monitoring Systems aim to detect anomalies and trends; anomalies are data patterns that have different data characteristics from normal instances, while trends are tendencies of production to move in a particular direction over time. In this work, we compare state-of-the-art ML approaches (ABOD, LOF, onlinePCA and osPCA) to detect outliers and events in high-dimensional monitoring problems. The compared anomaly detection strategies have been tested on a real industrial dataset related to a Semiconductor Manufacturing Etching process
Isolation Mondrian Forest for Batch and Online Anomaly Detection
We propose a new method, named isolation Mondrian forest (iMondrian forest),
for batch and online anomaly detection. The proposed method is a novel hybrid
of isolation forest and Mondrian forest which are existing methods for batch
anomaly detection and online random forest, respectively. iMondrian forest
takes the idea of isolation, using the depth of a node in a tree, and
implements it in the Mondrian forest structure. The result is a new data
structure which can accept streaming data in an online manner while being used
for anomaly detection. Our experiments show that iMondrian forest mostly
performs better than isolation forest in batch settings and has better or
comparable performance against other batch and online anomaly detection
methods.Comment: Accepted for presentation at the IEEE International Conference on
Systems, Man, and Cybernetics (SMC) 2020. The first three authors contributed
equally to this wor
Effective And Efficient Approach for Detecting Outliers
Now a days in machine learning research anomaly detection is the main topic. Anomaly detection is the process of identifying unusual behavior. It is widely used in data mining, for example, medical informatics, computer vision, computer security, sensor networks. Statistical approach aims to find the outliers which deviate from such distributions. Most distribution models are assumed univariate, and thus the lack of robustness for multidimensional data. We proposed an online and conditional anomaly detection method based on oversample PCA osPCA with LOO strategy will amplify the effect of outliers. We can successfully use the variation of the dominant principal direction to identify the presence of rare but abnormal data, for conditional anomaly detection expectation-maximization algorithms for learning the model is used. Our approach is reducing computational costs and memory requirements
Incremental Principal Component Analysis Based Outliers Detection Methods for Spatiotemporal Data Streams
In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams
Undersampling GA-SVM for network intrusion detection
Network intrusion detection is one of the hottest issues in the world. An increasing number of researchers and engineers deal with this problem by using machine learning methods. However, how to improve the identification accuracy of all the attack classes remains unsolved since the dataset is an imbalanced one with high imbalance ratio. This thesis work intends to build a classifier to achieve high classification accuracy. It proposes an undersampling Genetic Algorithm-Support Vector Machine (GA-SVM) method to handle this problem. It applies an undersampling method in GA-SVM. To solve the multiclassification problem with a binary classifier, this work proposes to utilize the undersampling GA-SVM with several classic structures. After adjusting the parameter in genetic algorithm and undersampling ratio in each support vector machine, this work concludes that the proposed undersampling GA-SVM improves the performance of an intrusion detection system. Among its variants, the decision tree-based undersampling GA-SVM offers the best performance
- …