1,192 research outputs found
XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning
A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient
Boosting Outlier Detection) is proposed, described and demonstrated for the
enhanced detection of outliers from normal observations in various practical
datasets. The proposed framework combines the strengths of both supervised and
unsupervised machine learning methods by creating a hybrid approach that
exploits each of their individual performance capabilities in outlier
detection. XGBOD uses multiple unsupervised outlier mining algorithms to
extract useful representations from the underlying data that augment the
predictive capabilities of an embedded supervised classifier on an improved
feature space. The novel approach is shown to provide superior performance in
comparison to competing individual detectors, the full ensemble and two
existing representation learning based algorithms across seven outlier
datasets.Comment: Proceedings of the 2018 International Joint Conference on Neural
Networks (IJCNN
Unsupervised Algorithms to Detect Zero-Day Attacks: Strategy and Application
In the last decade, researchers, practitioners and companies struggled for devising mechanisms to detect cyber-security threats. Among others, those efforts originated rule-based, signature-based or supervised Machine Learning (ML) algorithms that were proven effective for detecting those intrusions that have already been encountered and characterized. Instead, new unknown threats, often referred to as zero-day attacks or zero-days, likely go undetected as they are often misclassified by those techniques. In recent years, unsupervised anomaly detection algorithms showed potential to detect zero-days. However, dedicated support for quantitative analyses of unsupervised anomaly detection algorithms is still scarce and often does not promote meta-learning, which has potential to improve classification performance. To such extent, this paper introduces the problem of zero-days and reviews unsupervised algorithms for their detection. Then, the paper applies a question-answer approach to identify typical issues in conducting quantitative analyses for zero-days detection, and shows how to setup and exercise unsupervised algorithms with appropriate tooling. Using a very recent attack dataset, we debate on i) the impact of features on the detection performance of unsupervised algorithms, ii) the relevant metrics to evaluate intrusion detectors, iii) means to compare multiple unsupervised algorithms, iv) the application of meta-learning to reduce misclassifications. Ultimately, v) we measure detection performance of unsupervised anomaly detection algorithms with respect to zero-days. Overall, the paper exemplifies how to practically orchestrate and apply an appropriate methodology, process and tool, providing even non-experts with means to select appropriate strategies to deal with zero-days
Randomizing Ensemble-based approaches for Outlier
The data size is increasing dramatically every day, therefore, it has emerged the need of detecting abnormal behaviors, which can harm seriously our systems. Outlier detection refers to the process of identifying outlying activities, which diverge from the remaining group of data. This process, an integral part of data mining field, has experienced recently a substantial interest from the data mining community. An outlying activity or an outlier refers to a data point, which significantly deviates and appears to be inconsistent compared to other data members. Ensemble-based outlier detection is a line of research employed in order to reduce the model dependence from datasets or data locality by raising the robustness of the data mining procedures. The key principle of an ensemble approach is using the combination of individual detection results, which do not contain the same list of outliers in order to come up with a consensus finding. In this paper, we propose a novel strategy of constructing randomized ensemble outlier detection. This approach is an extension of the heuristic greedy ensemble construction previously built by the research community. We will focus on the core components of constructing an ensemble –based algorithm for outlier detection. The randomization will be performed by intervening into the pseudo code of greedy ensemble and implementing randomization in the respective java code through the ELKI data-mining platform. The key purpose of our approach is to improve the greedy ensemble and to overcome its local maxima problem. In order to induce diversity, it is performed randomization by initializing the search with a random outlier detector from the pool of detectors. Finally, the paper provides strong insights regarding the ongoing work of our randomized ensemble-based approach for outlier detection. Empirical results indicate that due to inducing diversity by employing various outlier detection algorithms, the randomized ensemble approach performs better than using only one outlier detector
LSCP: Locally Selective Combination in Parallel Outlier Ensembles
In unsupervised outlier ensembles, the absence of ground truth makes the
combination of base outlier detectors a challenging task. Specifically,
existing parallel outlier ensembles lack a reliable way of selecting competent
base detectors, affecting accuracy and stability, during model combination. In
this paper, we propose a framework---called Locally Selective Combination in
Parallel Outlier Ensembles (LSCP)---which addresses the issue by defining a
local region around a test instance using the consensus of its nearest
neighbors in randomly selected feature subspaces. The top-performing base
detectors in this local region are selected and combined as the model's final
output. Four variants of the LSCP framework are compared with seven widely used
parallel frameworks. Experimental results demonstrate that one of these
variants, LSCP_AOM, consistently outperforms baselines on the majority of
twenty real-world datasets.Comment: Proceedings of the 2019 SIAM International Conference on Data Mining
(SDM
Unsupervised Time Series Outlier Detection with Diversity-Driven Convolutional Ensembles.
With the sweeping digitalization of societal, medical, industrial, and
scientific processes, sensing technologies are being deployed that produce
increasing volumes of time series data, thus fueling a plethora of new or
improved applications. In this setting, outlier detection is frequently
important, and while solutions based on neural networks exist, they leave room
for improvement in terms of both accuracy and efficiency. With the objective of
achieving such improvements, we propose a diversity-driven, convolutional
ensemble. To improve accuracy, the ensemble employs multiple basic outlier
detection models built on convolutional sequence-to-sequence autoencoders that
can capture temporal dependencies in time series. Further, a novel
diversity-driven training method maintains diversity among the basic models,
with the aim of improving the ensemble's accuracy. To improve efficiency, the
approach enables a high degree of parallelism during training. In addition, it
is able to transfer some model parameters from one basic model to another,
which reduces training time. We report on extensive experiments using
real-world multivariate time series that offer insight into the design choices
underlying the new approach and offer evidence that it is capable of improved
accuracy and efficiency. This is an extended version of "Unsupervised Time
Series Outlier Detection with Diversity-Driven Convolutional Ensembles", to
appear in PVLDB 2022.Comment: 14 pages. An extended version of "Unsupervised Time Series Outlier
Detection with Diversity-Driven Convolutional Ensembles", to appear in PVLDB
202
Ensemble Methods for Anomaly Detection
Anomaly detection has many applications in numerous areas such as intrusion detection, fraud detection, and medical diagnosis. Most current techniques are specialized for detecting one type of anomaly, and work well on specific domains and when the data satisfies specific assumptions.
We address this problem, proposing ensemble anomaly detection techniques that perform well in many applications, with four major contributions: using bootstrapping to better detect anomalies on multiple subsamples, sequential application of diverse detection
algorithms, a novel adaptive sampling and learning algorithm in which the anomalies are iteratively examined, and improving the random forest algorithms for detecting anomalies in streaming data.
We design and evaluate multiple ensemble strategies using score normalization, rank aggregation and majority voting, to combine the results from six well-known base algorithms. We propose a bootstrapping algorithm in which anomalies are evaluated from multiple subsets of the data. Results show that our independent ensemble performs better than the base algorithms, and using bootstrapping achieves competitive quality and faster runtime compared with existing works.
We develop new sequential ensemble algorithms in which the second algorithm performs anomaly detection based on the first algorithm\u27s outputs; best results are obtained by combining algorithms that are substantially different. We propose a novel adaptive sampling algorithm which uses the score output of the base algorithm to determine the hard-to-detect examples, and iteratively resamples more points from such examples in a complete unsupervised context.
On streaming datasets, we analyze the impact of parameters used in random trees, and propose new algorithms that work well with high-dimensional data, improving performance without increasing the number of trees or their heights. We show that further improvements can be obtained with an Evolutionary Algorithm
Wisdom of the Contexts: Active Ensemble Learning for Contextual Anomaly Detection
In contextual anomaly detection (CAD), an object is only considered anomalous
within a specific context. Most existing methods for CAD use a single context
based on a set of user-specified contextual features. However, identifying the
right context can be very challenging in practice, especially in datasets, with
a large number of attributes. Furthermore, in real-world systems, there might
be multiple anomalies that occur in different contexts and, therefore, require
a combination of several "useful" contexts to unveil them. In this work, we
leverage active learning and ensembles to effectively detect complex contextual
anomalies in situations where the true contextual and behavioral attributes are
unknown. We propose a novel approach, called WisCon (Wisdom of the Contexts),
that automatically creates contexts from the feature set. Our method constructs
an ensemble of multiple contexts, with varying importance scores, based on the
assumption that not all useful contexts are equally so. Experiments show that
WisCon significantly outperforms existing baselines in different categories
(i.e., active classifiers, unsupervised contextual and non-contextual anomaly
detectors, and supervised classifiers) on seven datasets. Furthermore, the
results support our initial hypothesis that there is no single perfect context
that successfully uncovers all kinds of contextual anomalies, and leveraging
the "wisdom" of multiple contexts is necessary.Comment: Submitted to IEEE TKD
- …