137 research outputs found
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
To ensure undisrupted business, large Internet companies need to closely
monitor various KPIs (e.g., Page Views, number of online users, and number of
orders) of its Web applications, to accurately detect anomalies and trigger
timely troubleshooting/mitigation. However, anomaly detection for these
seasonal KPIs with various patterns and data quality has been a great
challenge, especially without labels. In this paper, we proposed Donut, an
unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our
key techniques, Donut greatly outperforms a state-of-arts supervised ensemble
approach and a baseline VAE approach, and its best F-scores range from 0.75 to
0.9 for the studied KPIs from a top global Internet company. We come up with a
novel KDE interpretation of reconstruction for Donut, making it the first
VAE-based anomaly detection algorithm with solid theoretical explanation.Comment: 12 pages (including references), 17 figures, submitted to WWW 2018:
The 2018 Web Conference, April 23--27, 2018, Lyon, France. The contents
discarded from the conference version due to the 9-page limitation are also
included in this versio
Anomaly Detection in Cloud-Native systems
In recent years, microservices have gained popularity due to their benefits such as increased maintainability and scalability of the system. The microservice architectural pattern was adopted for the development of a large scale system which is commonly deployed on public and private clouds, and therefore the aim is to ensure that it always maintains an optimal level of performance. Consequently, the system is monitored by collecting different metrics including performancerelated metrics.
The first part of this thesis focuses on the creation of a dataset of realistic time series with anomalies at deterministic locations. This dataset addresses the lack of labeled data for training of supervised models and the absence of publicly available data, in fact the data are not usually shared due to privacy concerns.
The second part consists of an empirical study on the detection of anomalies occurring in the different services that compose the system. Specifically, the aim is to understand if it is possible to predict the anomalies in order to perform actions before system failures or performance degradation. Consequently, eight different classification-based Machine Learning algorithms were compared by collecting accuracy, training time and testing time, to figure out which technique might be most suitable for reducing system overload.
The results showed that there are strong correlations between metrics and that it is possible to predict the anomalies in the system with approximately 90% of accuracy. The most important outcome is that performance-related anomalies can be detected by monitoring a limited number of metrics collected at runtime with a short training time. Future work includes the adoption of prediction-based approaches and the development of some tools for the prediction of anomalies in cloud native environments
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection
Massive key performance indicators (KPIs) are monitored as multivariate time
series data (MTS) to ensure the reliability of the software applications and
service system. Accurately detecting the abnormality of MTS is very critical
for subsequent fault elimination. The scarcity of anomalies and manual labeling
has led to the development of various self-supervised MTS anomaly detection
(AD) methods, which optimize an overall objective/loss encompassing all
metrics' regression objectives/losses. However, our empirical study uncovers
the prevalence of conflicts among metrics' regression objectives, causing MTS
models to grapple with different losses. This critical aspect significantly
impacts detection performance but has been overlooked in existing approaches.
To address this problem, by mimicking the design of multi-gate
mixture-of-experts (MMoE), we introduce CAD, a Conflict-aware multivariate KPI
Anomaly Detection algorithm. CAD offers an exclusive structure for each metric
to mitigate potential conflicts while fostering inter-metric promotions. Upon
thorough investigation, we find that the poor performance of vanilla MMoE
mainly comes from the input-output misalignment settings of MTS formulation and
convergence issues arising from expansive tasks. To address these challenges,
we propose a straightforward yet effective task-oriented metric selection and
p&s (personalized and shared) gating mechanism, which establishes CAD as the
first practicable multi-task learning (MTL) based MTS AD model. Evaluations on
multiple public datasets reveal that CAD obtains an average F1-score of 0.943
across three public datasets, notably outperforming state-of-the-art methods.
Our code is accessible at https://github.com/dawnvince/MTS_CAD.Comment: 11 pages, ESEC/FSE industry track 202
Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders
Lesion detection in brain Magnetic Resonance Images (MRI) remains a
challenging task. State-of-the-art approaches are mostly based on supervised
learning making use of large annotated datasets. Human beings, on the other
hand, even non-experts, can detect most abnormal lesions after seeing a handful
of healthy brain images. Replicating this capability of using prior information
on the appearance of healthy brain structure to detect lesions can help
computers achieve human level abnormality detection, specifically reducing the
need for numerous labeled examples and bettering generalization of previously
unseen lesions. To this end, we study detection of lesion regions in an
unsupervised manner by learning data distribution of brain MRI of healthy
subjects using auto-encoder based methods. We hypothesize that one of the main
limitations of the current models is the lack of consistency in latent
representation. We propose a simple yet effective constraint that helps mapping
of an image bearing lesion close to its corresponding healthy image in the
latent space. We use the Human Connectome Project dataset to learn distribution
of healthy-appearing brain MRI and report improved detection, in terms of AUC,
of the lesions in the BRATS challenge dataset.Comment: 9 pages, 5 figures, accepted at MIDL 201
- …