7 research outputs found
On Equivalence of Anomaly Detection Algorithms
In most domains anomaly detection is typically cast as an unsupervised learning problem because of the infeasability of labelling large datasets. In this setup, the evaluation and comparison of different anomaly detection algorithms is difficult. Although some work has been published in this field, they fail to account that different algorithms can detect different kinds of anomalies. More precisely, the literature on this topic has focused on defining criteria to determine which algorithm is better, while ignoring the fact that such criteria are meaningful only if the algorithms being compared are detecting the same kind of anomalies. Therefore, in this paper we propose an equivalence criterion for anomaly detection algorithms that measures to what degree two anomaly detection algorithms detect the same kind of anomalies. First, we lay out a set of desirable properties that such an equivalence criterion should have and why; second, we propose, Gaussian Equivalence Criterion (GEC) as equivalence criterion and show mathematically that it has the desirable properties previously mentioned. Finally, we empirically validate these properties using a simulated and a real-world dataset. For the real-world dataset, we show how GEC can provide insight about the anomaly detection algorithms as well as the dataset
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study
Detection of anomalous situations for complex mission-critical systems holds
paramount importance when their service continuity needs to be ensured. A major
challenge in detecting anomalies from the operational data arises due to the
imbalanced class distribution problem since the anomalies are supposed to be
rare events. This paper evaluates a diverse array of machine learning-based
anomaly detection algorithms through a comprehensive benchmark study. The paper
contributes significantly by conducting an unbiased comparison of various
anomaly detection algorithms, spanning classical machine learning including
various tree-based approaches to deep learning and outlier detection methods.
The inclusion of 104 publicly available and a few proprietary industrial
systems datasets enhances the diversity of the study, allowing for a more
realistic evaluation of algorithm performance and emphasizing the importance of
adaptability to real-world scenarios. The paper dispels the deep learning myth,
demonstrating that though powerful, deep learning is not a universal solution
in this case. We observed that recently proposed tree-based evolutionary
algorithms outperform in many scenarios. We noticed that tree-based approaches
catch a singleton anomaly in a dataset where deep learning methods fail. On the
other hand, classical SVM performs the best on datasets with more than 10%
anomalies, implying that such scenarios can be best modeled as a classification
problem rather than anomaly detection. To our knowledge, such a study on a
large number of state-of-the-art algorithms using diverse data sets, with the
objective of guiding researchers and practitioners in making informed
algorithmic choices, has not been attempted earlier
Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape
Anomaly detection aims at identifying unexpected fluctuations in the expected
behavior of a given system. It is acknowledged as a reliable answer to the
identification of zero-day attacks to such extent, several ML algorithms that
suit for binary classification have been proposed throughout years. However,
the experimental comparison of a wide pool of unsupervised algorithms for
anomaly-based intrusion detection against a comprehensive set of attacks
datasets was not investigated yet. To fill such gap, we exercise seventeen
unsupervised anomaly detection algorithms on eleven attack datasets. Results
allow elaborating on a wide range of arguments, from the behavior of the
individual algorithm to the suitability of the datasets to anomaly detection.
We conclude that algorithms as Isolation Forests, One-Class Support Vector
Machines and Self-Organizing Maps are more effective than their counterparts
for intrusion detection, while clustering algorithms represent a good
alternative due to their low computational complexity. Further, we detail how
attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms
and Botnets are more difficult to detect. Ultimately, we digress on
capabilities of algorithms in detecting anomalies generated by a wide pool of
unknown attacks, showing that achieved metric scores do not vary with respect
to identifying single attacks.Comment: Will be published on ACM Transactions Data Scienc