8,843 research outputs found
Why is this an anomaly? Explaining anomalies using sequential explanations
In most applications, anomaly detection operates in an unsupervised mode by looking for outliers hoping that they are anomalies. Unfortunately, most anomaly detectors do not come with explanations about which features make a detected outlier point anomalous. Therefore, it requires human analysts to manually browse through each detected outlier point’s feature space to obtain the subset of features that will help them determine whether they are genuinely anomalous or not. This paper introduces sequential explanation (SE) methods that sequentially explain to the analyst which features make the detected outlier anomalous. We present two methods for computing SEs called the outlier and sample-based SE that will work alongside any anomaly detector. The outlier-based SE methods use an anomaly detector’s outlier scoring measure guided by a search algorithm to compute the SEs. Meanwhile, the sample-based SE methods employ sampling to turn the problem into a classical feature selection problem. In our experiments, we compare the performances of the different outlier- and sample-based SEs. Our results show that both the outlier and sample-based methods compute SEs that perform well and outperform sequential feature explanations.http://www.elsevier.com/locate/patcoghj2021Computer Scienc
Recommended from our members
Anomaly Detection: Theory, Explanation and User Feedback
Anomaly detection has been used in variety of applications in practice, including cyber-security, fraud detection and detecting faults in safety critical systems, etc. Anomaly detectors produce a ranked list of statistical anomalies, which are typically examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, most anomaly detectors provide no explanations about why an instance was considered anomalous. To address this issue, we propose a feature based explanation approach called sequential feature explanation (SFE) to help the analyst in their investigation. A second problem with the anomaly detection systems is that they usually produce a large number of false positives due to a mismatch between statistical and semantic anomalies. We address this issue by incorporating human feedback, that is, we develop a human-in-the-loop anomaly detection system which can improve its detection rate with a simple form of true/false positive feedback from the analyst. We show empirically the efficacy and the superior performance of both of our explanation and feedback approaches on significant cyber security applications including red team attack data and real corporate network data along with a large number of benchmark datasets. We also delve into a set of state-of-the-art anomaly detection techniques to understand why they perform so well with a small number of training examples. We unify their working principle into a common framework underlying different pattern spaces and compute their sample complexity for achieving performance guarantees. In addition, we empirically investigate learning curves for anomaly detection in this framework
A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on
improving the accuracy of the detection, while largely ignoring the
explainability of the corresponding methods and thus leaving the explanation of
outcomes to practitioners. As anomaly detection algorithms are increasingly
used in safety-critical domains, providing explanations for the high-stakes
decisions made in those domains has become an ethical and regulatory
requirement. Therefore, this work provides a comprehensive and structured
survey on state-of-the-art explainable anomaly detection techniques. We propose
a taxonomy based on the main aspects that characterize each explainable anomaly
detection technique, aiming to help practitioners and researchers find the
explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from
Data (TKDD) for publication (preprint version
Capturing Evolution Genes for Time Series Data
The modeling of time series is becoming increasingly critical in a wide
variety of applications. Overall, data evolves by following different patterns,
which are generally caused by different user behaviors. Given a time series, we
define the evolution gene to capture the latent user behaviors and to describe
how the behaviors lead to the generation of time series. In particular, we
propose a uniform framework that recognizes different evolution genes of
segments by learning a classifier, and adopt an adversarial generator to
implement the evolution gene by estimating the segments' distribution.
Experimental results based on a synthetic dataset and five real-world datasets
show that our approach can not only achieve a good prediction results (e.g.,
averagely +10.56% in terms of F1), but is also able to provide explanations of
the results.Comment: a preprint version. arXiv admin note: text overlap with
arXiv:1703.10155 by other author
Graph Neural Networks based Log Anomaly Detection and Explanation
Event logs are widely used to record the status of high-tech systems, making
log anomaly detection important for monitoring those systems. Most existing log
anomaly detection methods take a log event count matrix or log event sequences
as input, exploiting quantitative and/or sequential relationships between log
events to detect anomalies. Unfortunately, only considering quantitative or
sequential relationships may result in low detection accuracy. To alleviate
this problem, we propose a graph-based method for unsupervised log anomaly
detection, dubbed Logs2Graphs, which first converts event logs into attributed,
directed, and weighted graphs, and then leverages graph neural networks to
perform graph-level anomaly detection. Specifically, we introduce One-Class
Digraph Inception Convolutional Networks, abbreviated as OCDiGCN, a novel graph
neural network model for detecting graph-level anomalies in a collection of
attributed, directed, and weighted graphs. By coupling the graph representation
and anomaly detection steps, OCDiGCN can learn a representation that is
especially suited for anomaly detection, resulting in a high detection
accuracy. Importantly, for each identified anomaly, we additionally provide a
small subset of nodes that play a crucial role in OCDiGCN's prediction as
explanations, which can offer valuable cues for subsequent root cause
diagnosis. Experiments on five benchmark datasets show that Logs2Graphs
performs at least on par with state-of-the-art log anomaly detection methods on
simple datasets while largely outperforming state-of-the-art log anomaly
detection methods on complicated datasets.Comment: Preprint submitted to Engineering Applications of Artificial
Intelligenc
- …