12 research outputs found
Copula-based anomaly scoring and localization for large-scale, high-dimensional continuous data
The anomaly detection method presented by this paper has a special feature:
it does not only indicate whether an observation is anomalous or not but also
tells what exactly makes an anomalous observation unusual. Hence, it provides
support to localize the reason of the anomaly.
The proposed approach is model-based; it relies on the multivariate
probability distribution associated with the observations. Since the rare
events are present in the tails of the probability distributions, we use copula
functions, that are able to model the fat-tailed distributions well. The
presented procedure scales well; it can cope with a large number of
high-dimensional samples. Furthermore, our procedure can cope with missing
values, too, which occur frequently in high-dimensional data sets.
In the second part of the paper, we demonstrate the usability of the method
through a case study, where we analyze a large data set consisting of the
performance counters of a real mobile telecommunication network. Since such
networks are complex systems, the signs of sub-optimal operation can remain
hidden for a potentially long time. With the proposed procedure, many such
hidden issues can be isolated and indicated to the network operator.Comment: 27 pages, 12 figures, accepted at ACM Transactions on Intelligent
Systems and Technolog
Tree-based mining contrast subspace
All existing mining contrast subspace methods employ density-based likelihood contrast scoring function to measure the likelihood of a query object to a target class against other class in a subspace. However, the density tends to decrease when the dimensionality of subspaces increases causes its bounds to identify inaccurate contrast subspaces for the given query object. This paper proposes a novel contrast subspace mining method that employs tree-based likelihood contrast scoring function which is not affected by the dimensionality of subspaces. The tree-based scoring measure recursively binary partitions the subspace space in the way that objects belong to the target class are grouped together and separated from objects belonging to other class. In contrast subspace, the query object should be in a group having a higher number of objects of the target class than other class. It incorporates the feature selection approach to find a subset of one-dimensional subspaces with high likelihood contrast score with respect to the query object. Therefore, the contrast subspaces are then searched through the selected subset of one-dimensional subspaces. An experiment is conducted to evaluate the effectiveness of the tree-based method in terms of classification accuracy. The experiment results show that the proposed method has higher classification accuracy and outperform the existing method on several real-world data sets
Towards Interpretable Anomaly Detection via Invariant Rule Mining
In the research area of anomaly detection, novel and promising methods are
frequently developed. However, most existing studies, especially those
leveraging deep neural networks, exclusively focus on the detection task only
and ignore the interpretability of the underlying models as well as their
detection results. However, anomaly interpretation, which aims to provide
explanation of why specific data instances are identified as anomalies, is an
equally (if not more) important task in many real-world applications. In this
work, we pursue highly interpretable anomaly detection via invariant rule
mining. Specifically, we leverage decision tree learning and association rule
mining to automatically generate invariant rules that are consistently
satisfied by the underlying data generation process. The generated invariant
rules can provide explicit explanation of anomaly detection results and thus
are extremely useful for subsequent decision-making. Furthermore, our empirical
evaluation shows that the proposed method can also achieve comparable
performance in terms of AUC and partial AUC with popular anomaly detection
models in various benchmark datasets
A new dimensionality-unbiased score for efficient and effective outlying aspect mining
The main aim of the outlying aspect mining algorithm is to automatically detect the subspace(s) (a.k.a. aspect(s)), where a given data point is dramatically different than the rest of the data in each of those subspace(s) (aspect(s)). To rank the subspaces for a given data point, a scoring measure is required to compute the outlying degree of the given data in each subspace. In this paper, we introduce a new measure to compute outlying degree, called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which not only detects the outliers but also provides an explanation on why the selected point is an outlier. SiNNE is a dimensionally unbias measure in its raw form, which means the scores produced by SiNNE are compared directly with subspaces having different dimensions. Thus, it does not require any normalization to make the score unbiased. Our experimental results on synthetic and publicly available real-world datasets revealed that (i) SiNNE produces better or at least the same results as existing scores. (ii) It improves the run time of the existing outlying aspect mining algorithm based on beam search by at least two orders of magnitude. SiNNE allows the existing outlying aspect mining algorithm to run in datasets with hundreds of thousands of instances and thousands of dimensions which was not possible before. © 2022, The Author(s)
A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on
improving the accuracy of the detection, while largely ignoring the
explainability of the corresponding methods and thus leaving the explanation of
outcomes to practitioners. As anomaly detection algorithms are increasingly
used in safety-critical domains, providing explanations for the high-stakes
decisions made in those domains has become an ethical and regulatory
requirement. Therefore, this work provides a comprehensive and structured
survey on state-of-the-art explainable anomaly detection techniques. We propose
a taxonomy based on the main aspects that characterize each explainable anomaly
detection technique, aiming to help practitioners and researchers find the
explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from
Data (TKDD) for publication (preprint version
Deteksi Anomali Pada Pemakaian Air Pelanggan PDAM Surya Sembada Kota Surabaya Menggunakan Kohonen SOM dan Local Outlier Factor
Kehilangan air dalam distribusi merupakan masalah yang cukup serius, kerugian yang disebabkan oleh tingkat kehilangan PDAM Surya Sembada Kota Surabaya mencapai 2 Miliar Rupiah.Pada tahun 2011 terjadi pencurian air sebanyak 100 kasus. Deteksi anomali pada penelitian dilakukan menggunakan data selama Maret 2017 – Februari 2018. Data yang digunakan adalah pemakaian air kemudian didapatkan variabel-variabel rata-rata pemakaian air, maksimal pemakaian air, dan deviasi standar pemakaian air. Algoritma Kohonen-SOM mendapatkan 45 kelompok yang dianggap kelompok anomali dengan kriteria silhouette width kurang dari rata-rata silhouette width pada kelompok yang ter-bentuk. Terdapat 45 kelompok yang terduga anomali. Local Outlier Factor menghasilkan 1229 kejadian konsumsi yang tidak normal, 1229 kejadian tersebut terdiri dari 579 rumah tangga atau pelanggan. Perhitungan frekuensi yang dilakukan mendapatkan 42 pelanggan yang terduga anomali. Hasil deteksi anomali dengan metode PDAM dari 42 pelanggan te-rsebut hanya 16 yang terde-teksi. Hal tersebut dikarenakan metode PDAM gagal menangkap perilaku konsumsi yang aneh seperti konsumsi yang konstan setiap bulan. Karakteristik pelanggan yang terdeteksi anomali adalah mempunyai rata-rata pemakaian lebih dari rata-rata pemakaian golongan dan sub-zona.
============================================================
The loss of water in the distribution is a serious problem in PDAM Surabaya, the loss caused by loss rate of PDAM Surabaya reach 2 billion Rupiah. In 2011 there was a theft of 100 cases. Detection of anomalies in the study was conducted using data during March 2017 - February 2018. The data used is water consumption then obtained the variables likemean of water use, maximum water use, and standart deviation of water usage. The Kohonen-SOM algorithm obtained 45 groups considered anomalous group with silhouette width criteria less than the silhouette width average in the group formed. There are 45 groups of unexpected anomalies. Local Outlier Factor produced 1229 unusual consumption events, 1229 incidents consisting of 579 households or customers. The frequency calculation performed gets 42 suspected anomaly customers. The result of anomaly detection with PDAM method from 42 customers was only 16 detected. This is because the PDAM method fails to capture strange consumption behaviors such as cons-tant consumption every month. The characteristic of the customer detected by the anomaly is to have an average of more than average usage of classes and sub-zones