5,640 research outputs found
DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN
Recently, the introduction of the generative adversarial network (GAN) and
its variants has enabled the generation of realistic synthetic samples, which
has been used for enlarging training sets. Previous work primarily focused on
data augmentation for semi-supervised and supervised tasks. In this paper, we
instead focus on unsupervised anomaly detection and propose a novel generative
data augmentation framework optimized for this task. In particular, we propose
to oversample infrequent normal samples - normal samples that occur with small
probability, e.g., rare normal events. We show that these samples are
responsible for false positives in anomaly detection. However, oversampling of
infrequent normal samples is challenging for real-world high-dimensional data
with multimodal distributions. To address this challenge, we propose to use a
GAN variant known as the adversarial autoencoder (AAE) to transform the
high-dimensional multimodal data distributions into low-dimensional unimodal
latent distributions with well-defined tail probability. Then, we
systematically oversample at the `edge' of the latent distributions to increase
the density of infrequent normal samples. We show that our oversampling
pipeline is a unified one: it is generally applicable to datasets with
different complex data distributions. To the best of our knowledge, our method
is the first data augmentation technique focused on improving performance in
unsupervised anomaly detection. We validate our method by demonstrating
consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International
Conference on Data Mining
Multi-Source Neural Variational Inference
Learning from multiple sources of information is an important problem in
machine-learning research. The key challenges are learning representations and
formulating inference methods that take into account the complementarity and
redundancy of various information sources. In this paper we formulate a
variational autoencoder based multi-source learning framework in which each
encoder is conditioned on a different information source. This allows us to
relate the sources via the shared latent variables by computing divergence
measures between individual source's posterior approximations. We explore a
variety of options to learn these encoders and to integrate the beliefs they
compute into a consistent posterior approximation. We visualise learned beliefs
on a toy dataset and evaluate our methods for learning shared representations
and structured output prediction, showing trade-offs of learning separate
encoders for each information source. Furthermore, we demonstrate how conflict
detection and redundancy can increase robustness of inference in a multi-source
setting.Comment: AAAI 2019, Association for the Advancement of Artificial Intelligence
(AAAI) 201
Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention
Prompt and accurate detection of system anomalies is essential to ensure the
reliability of software systems. Unlike manual efforts that exploit all
available run-time information, existing approaches usually leverage only a
single type of monitoring data (often logs or metrics) or fail to make
effective use of the joint information among different types of data.
Consequently, many false predictions occur. To better understand the
manifestations of system anomalies, we conduct a systematical study on a large
amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates
that logs and metrics can manifest system anomalies collaboratively and
complementarily, and neither of them only is sufficient. Thus, integrating
heterogeneous data can help recover the complete picture of a system's health
status. In this context, we propose Hades, the first end-to-end semi-supervised
approach to effectively identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global
representation of the system status by fusing log semantics and metric
patterns. It captures discriminative features and meaningful interactions from
heterogeneous data via a cross-modal attention module, trained in a
semi-supervised manner. We evaluate Hades extensively on large-scale simulated
data and datasets from Huawei Cloud. The experimental results present the
effectiveness of our model in detecting system anomalies. We also release the
code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on
Software Engineering (ICSE). arXiv admin note: substantial text overlap with
arXiv:2207.0291
Generative Models for Novelty Detection Applications in abnormal event and situational changedetection from data series
Novelty detection is a process for distinguishing the observations that differ in some respect
from the observations that the model is trained on. Novelty detection is one of the fundamental
requirements of a good classification or identification system since sometimes the
test data contains observations that were not known at the training time. In other words, the
novelty class is often is not presented during the training phase or not well defined.
In light of the above, one-class classifiers and generative methods can efficiently model
such problems. However, due to the unavailability of data from the novelty class, training
an end-to-end model is a challenging task itself. Therefore, detecting the Novel classes in
unsupervised and semi-supervised settings is a crucial step in such tasks.
In this thesis, we propose several methods to model the novelty detection problem in
unsupervised and semi-supervised fashion. The proposed frameworks applied to different
related applications of anomaly and outlier detection tasks. The results show the superior of
our proposed methods in compare to the baselines and state-of-the-art methods
- …