7 research outputs found

    On Equivalence of Anomaly Detection Algorithms

    Get PDF
    In most domains anomaly detection is typically cast as an unsupervised learning problem because of the infeasability of labelling large datasets. In this setup, the evaluation and comparison of different anomaly detection algorithms is difficult. Although some work has been published in this field, they fail to account that different algorithms can detect different kinds of anomalies. More precisely, the literature on this topic has focused on defining criteria to determine which algorithm is better, while ignoring the fact that such criteria are meaningful only if the algorithms being compared are detecting the same kind of anomalies. Therefore, in this paper we propose an equivalence criterion for anomaly detection algorithms that measures to what degree two anomaly detection algorithms detect the same kind of anomalies. First, we lay out a set of desirable properties that such an equivalence criterion should have and why; second, we propose, Gaussian Equivalence Criterion (GEC) as equivalence criterion and show mathematically that it has the desirable properties previously mentioned. Finally, we empirically validate these properties using a simulated and a real-world dataset. For the real-world dataset, we show how GEC can provide insight about the anomaly detection algorithms as well as the dataset

    Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study

    Full text link
    Detection of anomalous situations for complex mission-critical systems holds paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of machine learning-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical machine learning including various tree-based approaches to deep learning and outlier detection methods. The inclusion of 104 publicly available and a few proprietary industrial systems datasets enhances the diversity of the study, allowing for a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper dispels the deep learning myth, demonstrating that though powerful, deep learning is not a universal solution in this case. We observed that recently proposed tree-based evolutionary algorithms outperform in many scenarios. We noticed that tree-based approaches catch a singleton anomaly in a dataset where deep learning methods fail. On the other hand, classical SVM performs the best on datasets with more than 10% anomalies, implying that such scenarios can be best modeled as a classification problem rather than anomaly detection. To our knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier

    Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape

    Get PDF
    Anomaly detection aims at identifying unexpected fluctuations in the expected behavior of a given system. It is acknowledged as a reliable answer to the identification of zero-day attacks to such extent, several ML algorithms that suit for binary classification have been proposed throughout years. However, the experimental comparison of a wide pool of unsupervised algorithms for anomaly-based intrusion detection against a comprehensive set of attacks datasets was not investigated yet. To fill such gap, we exercise seventeen unsupervised anomaly detection algorithms on eleven attack datasets. Results allow elaborating on a wide range of arguments, from the behavior of the individual algorithm to the suitability of the datasets to anomaly detection. We conclude that algorithms as Isolation Forests, One-Class Support Vector Machines and Self-Organizing Maps are more effective than their counterparts for intrusion detection, while clustering algorithms represent a good alternative due to their low computational complexity. Further, we detail how attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms and Botnets are more difficult to detect. Ultimately, we digress on capabilities of algorithms in detecting anomalies generated by a wide pool of unknown attacks, showing that achieved metric scores do not vary with respect to identifying single attacks.Comment: Will be published on ACM Transactions Data Scienc
    corecore