815 research outputs found

    DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

    Full text link
    Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge' of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International Conference on Data Mining

    Anomaly Detection Approaches for Semiconductor Manufacturing

    Get PDF
    Abstract Smart production monitoring is a crucial activity in advanced manufacturing for quality, control and maintenance purposes. Advanced Monitoring Systems aim to detect anomalies and trends; anomalies are data patterns that have different data characteristics from normal instances, while trends are tendencies of production to move in a particular direction over time. In this work, we compare state-of-the-art ML approaches (ABOD, LOF, onlinePCA and osPCA) to detect outliers and events in high-dimensional monitoring problems. The compared anomaly detection strategies have been tested on a real industrial dataset related to a Semiconductor Manufacturing Etching process

    Isolation Mondrian Forest for Batch and Online Anomaly Detection

    Full text link
    We propose a new method, named isolation Mondrian forest (iMondrian forest), for batch and online anomaly detection. The proposed method is a novel hybrid of isolation forest and Mondrian forest which are existing methods for batch anomaly detection and online random forest, respectively. iMondrian forest takes the idea of isolation, using the depth of a node in a tree, and implements it in the Mondrian forest structure. The result is a new data structure which can accept streaming data in an online manner while being used for anomaly detection. Our experiments show that iMondrian forest mostly performs better than isolation forest in batch settings and has better or comparable performance against other batch and online anomaly detection methods.Comment: Accepted for presentation at the IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2020. The first three authors contributed equally to this wor

    Effective And Efficient Approach for Detecting Outliers

    Get PDF
    Now a days in machine learning research anomaly detection is the main topic. Anomaly detection is the process of identifying unusual behavior. It is widely used in data mining, for example, medical informatics, computer vision, computer security, sensor networks. Statistical approach aims to find the outliers which deviate from such distributions. Most distribution models are assumed univariate, and thus the lack of robustness for multidimensional data. We proposed an online and conditional anomaly detection method based on oversample PCA osPCA with LOO strategy will amplify the effect of outliers. We can successfully use the variation of the dominant principal direction to identify the presence of rare but abnormal data, for conditional anomaly detection expectation-maximization algorithms for learning the model is used. Our approach is reducing computational costs and memory requirements

    Incremental Principal Component Analysis Based Outliers Detection Methods for Spatiotemporal Data Streams

    Get PDF
    In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams

    Undersampling GA-SVM for network intrusion detection

    Get PDF
    Network intrusion detection is one of the hottest issues in the world. An increasing number of researchers and engineers deal with this problem by using machine learning methods. However, how to improve the identification accuracy of all the attack classes remains unsolved since the dataset is an imbalanced one with high imbalance ratio. This thesis work intends to build a classifier to achieve high classification accuracy. It proposes an undersampling Genetic Algorithm-Support Vector Machine (GA-SVM) method to handle this problem. It applies an undersampling method in GA-SVM. To solve the multiclassification problem with a binary classifier, this work proposes to utilize the undersampling GA-SVM with several classic structures. After adjusting the parameter in genetic algorithm and undersampling ratio in each support vector machine, this work concludes that the proposed undersampling GA-SVM improves the performance of an intrusion detection system. Among its variants, the decision tree-based undersampling GA-SVM offers the best performance
    • …
    corecore