3,869 research outputs found
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification
Generative Adversarial Networks (GANs) have been used in many different
applications to generate realistic synthetic data. We introduce a novel GAN
with Autoencoder (GAN-AE) architecture to generate synthetic samples for
variable length, multi-feature sequence datasets. In this model, we develop a
GAN architecture with an additional autoencoder component, where recurrent
neural networks (RNNs) are used for each component of the model in order to
generate synthetic data to improve classification accuracy for a highly
imbalanced medical device dataset. In addition to the medical device dataset,
we also evaluate the GAN-AE performance on two additional datasets and
demonstrate the application of GAN-AE to a sequence-to-sequence task where both
synthetic sequence inputs and sequence outputs must be generated. To evaluate
the quality of the synthetic data, we train encoder-decoder models both with
and without the synthetic data and compare the classification model
performance. We show that a model trained with GAN-AE generated synthetic data
outperforms models trained with synthetic data generated both with standard
oversampling techniques such as SMOTE and Autoencoders as well as with state of
the art GAN-based models
Autoencoders for strategic decision support
In the majority of executive domains, a notion of normality is involved in
most strategic decisions. However, few data-driven tools that support strategic
decision-making are available. We introduce and extend the use of autoencoders
to provide strategically relevant granular feedback. A first experiment
indicates that experts are inconsistent in their decision making, highlighting
the need for strategic decision support. Furthermore, using two large
industry-provided human resources datasets, the proposed solution is evaluated
in terms of ranking accuracy, synergy with human experts, and dimension-level
feedback. This three-point scheme is validated using (a) synthetic data, (b)
the perspective of data quality, (c) blind expert validation, and (d)
transparent expert evaluation. Our study confirms several principal weaknesses
of human decision-making and stresses the importance of synergy between a model
and humans. Moreover, unsupervised learning and in particular the autoencoder
are shown to be valuable tools for strategic decision-making
Ensemble Methods for Anomaly Detection
Anomaly detection has many applications in numerous areas such as intrusion detection, fraud detection, and medical diagnosis. Most current techniques are specialized for detecting one type of anomaly, and work well on specific domains and when the data satisfies specific assumptions.
We address this problem, proposing ensemble anomaly detection techniques that perform well in many applications, with four major contributions: using bootstrapping to better detect anomalies on multiple subsamples, sequential application of diverse detection
algorithms, a novel adaptive sampling and learning algorithm in which the anomalies are iteratively examined, and improving the random forest algorithms for detecting anomalies in streaming data.
We design and evaluate multiple ensemble strategies using score normalization, rank aggregation and majority voting, to combine the results from six well-known base algorithms. We propose a bootstrapping algorithm in which anomalies are evaluated from multiple subsets of the data. Results show that our independent ensemble performs better than the base algorithms, and using bootstrapping achieves competitive quality and faster runtime compared with existing works.
We develop new sequential ensemble algorithms in which the second algorithm performs anomaly detection based on the first algorithm\u27s outputs; best results are obtained by combining algorithms that are substantially different. We propose a novel adaptive sampling algorithm which uses the score output of the base algorithm to determine the hard-to-detect examples, and iteratively resamples more points from such examples in a complete unsupervised context.
On streaming datasets, we analyze the impact of parameters used in random trees, and propose new algorithms that work well with high-dimensional data, improving performance without increasing the number of trees or their heights. We show that further improvements can be obtained with an Evolutionary Algorithm
A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on
improving the accuracy of the detection, while largely ignoring the
explainability of the corresponding methods and thus leaving the explanation of
outcomes to practitioners. As anomaly detection algorithms are increasingly
used in safety-critical domains, providing explanations for the high-stakes
decisions made in those domains has become an ethical and regulatory
requirement. Therefore, this work provides a comprehensive and structured
survey on state-of-the-art explainable anomaly detection techniques. We propose
a taxonomy based on the main aspects that characterize each explainable anomaly
detection technique, aiming to help practitioners and researchers find the
explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from
Data (TKDD) for publication (preprint version
- …