973 research outputs found
Outlier Detection Ensemble with Embedded Feature Selection
Feature selection places an important role in improving the performance of
outlier detection, especially for noisy data. Existing methods usually perform
feature selection and outlier scoring separately, which would select feature
subsets that may not optimally serve for outlier detection, leading to
unsatisfying performance. In this paper, we propose an outlier detection
ensemble framework with embedded feature selection (ODEFS), to address this
issue. Specifically, for each random sub-sampling based learning component,
ODEFS unifies feature selection and outlier detection into a pairwise ranking
formulation to learn feature subsets that are tailored for the outlier
detection method. Moreover, we adopt the thresholded self-paced learning to
simultaneously optimize feature selection and example selection, which is
helpful to improve the reliability of the training set. After that, we design
an alternate algorithm with proved convergence to solve the resultant
optimization problem. In addition, we analyze the generalization error bound of
the proposed framework, which provides theoretical guarantee on the method and
insightful practical guidance. Comprehensive experimental results on 12
real-world datasets from diverse domains validate the superiority of the
proposed ODEFS.Comment: 10pages, AAAI202
A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on
improving the accuracy of the detection, while largely ignoring the
explainability of the corresponding methods and thus leaving the explanation of
outcomes to practitioners. As anomaly detection algorithms are increasingly
used in safety-critical domains, providing explanations for the high-stakes
decisions made in those domains has become an ethical and regulatory
requirement. Therefore, this work provides a comprehensive and structured
survey on state-of-the-art explainable anomaly detection techniques. We propose
a taxonomy based on the main aspects that characterize each explainable anomaly
detection technique, aiming to help practitioners and researchers find the
explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from
Data (TKDD) for publication (preprint version
Homophily Outlier Detection in Non-IID Categorical Data
Most of existing outlier detection methods assume that the outlier factors
(i.e., outlierness scoring measures) of data entities (e.g., feature values and
data objects) are Independent and Identically Distributed (IID). This
assumption does not hold in real-world applications where the outlierness of
different entities is dependent on each other and/or taken from different
probability distributions (non-IID). This may lead to the failure of detecting
important outliers that are too subtle to be identified without considering the
non-IID nature. The issue is even intensified in more challenging contexts,
e.g., high-dimensional data with many noisy features. This work introduces a
novel outlier detection framework and its two instances to identify outliers in
categorical data by capturing non-IID outlier factors. Our approach first
defines and incorporates distribution-sensitive outlier factors and their
interdependence into a value-value graph-based representation. It then models
an outlierness propagation process in the value graph to learn the outlierness
of feature values. The learned value outlierness allows for either direct
outlier detection or outlying feature selection. The graph representation and
mining approach is employed here to well capture the rich non-IID
characteristics. Our empirical results on 15 real-world data sets with
different levels of data complexities show that (i) the proposed outlier
detection methods significantly outperform five state-of-the-art methods at the
95%/99% confidence level, achieving 10%-28% AUC improvement on the 10 most
complex data sets; and (ii) the proposed feature selection methods
significantly outperform three competing methods in enabling subsequent outlier
detection of two different existing detectors.Comment: To appear in Data Ming and Knowledge Discovery Journa
Towards a Hierarchical Approach for Outlier Detection in Industrial Production Settings
In the context of Industry 4.0, the degree of cross-linking between machines, sensors, and production lines increases rapidly.However, this trend also offers the potential for the improve-ment of outlier scores, especially by combining outlier detectioninformation between different production levels. The latter, in turn, offer various other useful aspects like different time series resolutions or context variables. When utilizing these aspects, valuable outlier information can be extracted, which can be then used for condition-based monitoring, alert management, or predictive maintenance. In this work, we compare different types of outlier detection methods and scores in the light of the aforementioned production levels with the goal to develop a modelfor outlier detection that incorporates these production levels.The proposed model, in turn, is basically inspired by a use casefrom the field of additive manufacturing, which is also known asindustrial 3D-printing. Altogether, our model shall improve the detection of outliers by the use of a hierarchical structure that utilizes production levels in industrial scenarios
Anomaly Detection in Sequential Data: A Deep Learning-Based Approach
Anomaly Detection has been researched in various domains with several applications in intrusion detection, fraud detection, system health management, and bio-informatics. Conventional anomaly detection methods analyze each data instance independently (univariate or multivariate) and ignore the sequential characteristics of the data. Anomalies in the data can be detected by grouping the individual data instances into sequential data and hence conventional way of analyzing independent data instances cannot detect anomalies. Currently: (1) Deep learning-based algorithms are widely used for anomaly detection purposes. However, significant computational overhead time is incurred during the training process due to static constant batch size and learning rate parameters for each epoch, (2) the threshold to decide whether an event is normal or malicious is often set as static. This can drastically increase the false alarm rate if the threshold is set low or decrease the True Alarm rate if it is set to a remarkably high value, (3) Real-life data is messy. It is impossible to learn the data features by training just one algorithm. Therefore, several one-class-based algorithms need to be trained. The final output is the ensemble of the output from all the algorithms. The prediction accuracy can be increased by giving a proper weight to each algorithm\u27s output. By extending the state-of-the-art techniques in learning-based algorithms, this dissertation provides the following solutions: (i) To address (1), we propose a hybrid, dynamic batch size and learning rate tuning algorithm that reduces the overall training time of the neural network. (ii) As a solution for (2), we present an adaptive thresholding algorithm that reduces high false alarm rates. (iii) To overcome (3), we propose a multilevel hybrid ensemble anomaly detection framework that increases the anomaly detection rate of the high dimensional dataset
Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods
Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques.
The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns.
The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other.
The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques.
The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy
A survey on explainable anomaly detection
NWOAlgorithms and the Foundations of Software technolog
- …