12 research outputs found

    Copula-based anomaly scoring and localization for large-scale, high-dimensional continuous data

    Full text link
    The anomaly detection method presented by this paper has a special feature: it does not only indicate whether an observation is anomalous or not but also tells what exactly makes an anomalous observation unusual. Hence, it provides support to localize the reason of the anomaly. The proposed approach is model-based; it relies on the multivariate probability distribution associated with the observations. Since the rare events are present in the tails of the probability distributions, we use copula functions, that are able to model the fat-tailed distributions well. The presented procedure scales well; it can cope with a large number of high-dimensional samples. Furthermore, our procedure can cope with missing values, too, which occur frequently in high-dimensional data sets. In the second part of the paper, we demonstrate the usability of the method through a case study, where we analyze a large data set consisting of the performance counters of a real mobile telecommunication network. Since such networks are complex systems, the signs of sub-optimal operation can remain hidden for a potentially long time. With the proposed procedure, many such hidden issues can be isolated and indicated to the network operator.Comment: 27 pages, 12 figures, accepted at ACM Transactions on Intelligent Systems and Technolog

    Tree-based mining contrast subspace

    Get PDF
    All existing mining contrast subspace methods employ density-based likelihood contrast scoring function to measure the likelihood of a query object to a target class against other class in a subspace. However, the density tends to decrease when the dimensionality of subspaces increases causes its bounds to identify inaccurate contrast subspaces for the given query object. This paper proposes a novel contrast subspace mining method that employs tree-based likelihood contrast scoring function which is not affected by the dimensionality of subspaces. The tree-based scoring measure recursively binary partitions the subspace space in the way that objects belong to the target class are grouped together and separated from objects belonging to other class. In contrast subspace, the query object should be in a group having a higher number of objects of the target class than other class. It incorporates the feature selection approach to find a subset of one-dimensional subspaces with high likelihood contrast score with respect to the query object. Therefore, the contrast subspaces are then searched through the selected subset of one-dimensional subspaces. An experiment is conducted to evaluate the effectiveness of the tree-based method in terms of classification accuracy. The experiment results show that the proposed method has higher classification accuracy and outperform the existing method on several real-world data sets

    Towards Interpretable Anomaly Detection via Invariant Rule Mining

    Full text link
    In the research area of anomaly detection, novel and promising methods are frequently developed. However, most existing studies, especially those leveraging deep neural networks, exclusively focus on the detection task only and ignore the interpretability of the underlying models as well as their detection results. However, anomaly interpretation, which aims to provide explanation of why specific data instances are identified as anomalies, is an equally (if not more) important task in many real-world applications. In this work, we pursue highly interpretable anomaly detection via invariant rule mining. Specifically, we leverage decision tree learning and association rule mining to automatically generate invariant rules that are consistently satisfied by the underlying data generation process. The generated invariant rules can provide explicit explanation of anomaly detection results and thus are extremely useful for subsequent decision-making. Furthermore, our empirical evaluation shows that the proposed method can also achieve comparable performance in terms of AUC and partial AUC with popular anomaly detection models in various benchmark datasets

    A new dimensionality-unbiased score for efficient and effective outlying aspect mining

    Get PDF
    The main aim of the outlying aspect mining algorithm is to automatically detect the subspace(s) (a.k.a. aspect(s)), where a given data point is dramatically different than the rest of the data in each of those subspace(s) (aspect(s)). To rank the subspaces for a given data point, a scoring measure is required to compute the outlying degree of the given data in each subspace. In this paper, we introduce a new measure to compute outlying degree, called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which not only detects the outliers but also provides an explanation on why the selected point is an outlier. SiNNE is a dimensionally unbias measure in its raw form, which means the scores produced by SiNNE are compared directly with subspaces having different dimensions. Thus, it does not require any normalization to make the score unbiased. Our experimental results on synthetic and publicly available real-world datasets revealed that (i) SiNNE produces better or at least the same results as existing scores. (ii) It improves the run time of the existing outlying aspect mining algorithm based on beam search by at least two orders of magnitude. SiNNE allows the existing outlying aspect mining algorithm to run in datasets with hundreds of thousands of instances and thousands of dimensions which was not possible before. © 2022, The Author(s)

    A Survey on Explainable Anomaly Detection

    Full text link
    In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those domains has become an ethical and regulatory requirement. Therefore, this work provides a comprehensive and structured survey on state-of-the-art explainable anomaly detection techniques. We propose a taxonomy based on the main aspects that characterize each explainable anomaly detection technique, aiming to help practitioners and researchers find the explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from Data (TKDD) for publication (preprint version

    Deteksi Anomali Pada Pemakaian Air Pelanggan PDAM Surya Sembada Kota Surabaya Menggunakan Kohonen SOM dan Local Outlier Factor

    Get PDF
    Kehilangan air dalam distribusi merupakan masalah yang cukup serius, kerugian yang disebabkan oleh tingkat kehilangan PDAM Surya Sembada Kota Surabaya mencapai 2 Miliar Rupiah.Pada tahun 2011 terjadi pencurian air sebanyak 100 kasus. Deteksi anomali pada penelitian dilakukan menggunakan data selama Maret 2017 – Februari 2018. Data yang digunakan adalah pemakaian air kemudian didapatkan variabel-variabel rata-rata pemakaian air, maksimal pemakaian air, dan deviasi standar pemakaian air. Algoritma Kohonen-SOM mendapatkan 45 kelompok yang dianggap kelompok anomali dengan kriteria silhouette width kurang dari rata-rata silhouette width pada kelompok yang ter-bentuk. Terdapat 45 kelompok yang terduga anomali. Local Outlier Factor menghasilkan 1229 kejadian konsumsi yang tidak normal, 1229 kejadian tersebut terdiri dari 579 rumah tangga atau pelanggan. Perhitungan frekuensi yang dilakukan mendapatkan 42 pelanggan yang terduga anomali. Hasil deteksi anomali dengan metode PDAM dari 42 pelanggan te-rsebut hanya 16 yang terde-teksi. Hal tersebut dikarenakan metode PDAM gagal menangkap perilaku konsumsi yang aneh seperti konsumsi yang konstan setiap bulan. Karakteristik pelanggan yang terdeteksi anomali adalah mempunyai rata-rata pemakaian lebih dari rata-rata pemakaian golongan dan sub-zona. ============================================================ The loss of water in the distribution is a serious problem in PDAM Surabaya, the loss caused by loss rate of PDAM Surabaya reach 2 billion Rupiah. In 2011 there was a theft of 100 cases. Detection of anomalies in the study was conducted using data during March 2017 - February 2018. The data used is water consumption then obtained the variables likemean of water use, maximum water use, and standart deviation of water usage. The Kohonen-SOM algorithm obtained 45 groups considered anomalous group with silhouette width criteria less than the silhouette width average in the group formed. There are 45 groups of unexpected anomalies. Local Outlier Factor produced 1229 unusual consumption events, 1229 incidents consisting of 579 households or customers. The frequency calculation performed gets 42 suspected anomaly customers. The result of anomaly detection with PDAM method from 42 customers was only 16 detected. This is because the PDAM method fails to capture strange consumption behaviors such as cons-tant consumption every month. The characteristic of the customer detected by the anomaly is to have an average of more than average usage of classes and sub-zones
    corecore