8,972 research outputs found
GAN Ensemble for Anomaly Detection
When formulated as an unsupervised learning problem, anomaly detection often
requires a model to learn the distribution of normal data. Previous works apply
Generative Adversarial Networks (GANs) to anomaly detection tasks and show good
performances from these models. Motivated by the observation that GAN ensembles
often outperform single GANs in generation tasks, we propose to construct GAN
ensembles for anomaly detection. In the proposed method, a group of generators
and a group of discriminators are trained together, so every generator gets
feedback from multiple discriminators, and vice versa. Compared to a single
GAN, a GAN ensemble can better model the distribution of normal data and thus
better detect anomalies. Our theoretical analysis of GANs and GAN ensembles
explains the role of a GAN discriminator in anomaly detection. In the empirical
study, we evaluate ensembles constructed from four types of base models, and
the results show that these ensembles clearly outperform single models in a
series of tasks of anomaly detection.Comment: 8 pages, 6 figures. To appear in Proceedings 35th AAAI Conference on
Artificial Intelligence (AAAI 21
Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey
This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification
Generative Adversarial Networks (GANs) have been used in many different
applications to generate realistic synthetic data. We introduce a novel GAN
with Autoencoder (GAN-AE) architecture to generate synthetic samples for
variable length, multi-feature sequence datasets. In this model, we develop a
GAN architecture with an additional autoencoder component, where recurrent
neural networks (RNNs) are used for each component of the model in order to
generate synthetic data to improve classification accuracy for a highly
imbalanced medical device dataset. In addition to the medical device dataset,
we also evaluate the GAN-AE performance on two additional datasets and
demonstrate the application of GAN-AE to a sequence-to-sequence task where both
synthetic sequence inputs and sequence outputs must be generated. To evaluate
the quality of the synthetic data, we train encoder-decoder models both with
and without the synthetic data and compare the classification model
performance. We show that a model trained with GAN-AE generated synthetic data
outperforms models trained with synthetic data generated both with standard
oversampling techniques such as SMOTE and Autoencoders as well as with state of
the art GAN-based models
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
- …