research

Technical Report: Anomaly Detection for a Critical Industrial System using Context, Logs and Metrics

Abstract

Recent advances in contextual anomaly detection attempt to combine resource metrics and event logs to un- cover unexpected system behaviors and malfunctions at run- time. These techniques are highly relevant for critical software systems, where monitoring is often mandated by international standards and guidelines. In this technical report, we analyze the effectiveness of a metrics-logs contextual anomaly detection technique in a middleware for Air Traffic Control systems. Our study addresses the challenges of applying such techniques to a new case study with a dense volume of logs, and finer monitoring sampling rate. We propose an automated abstraction approach to infer system activities from dense logs and use regression analysis to infer the anomaly detector. We observed that the detection accuracy is impacted by abrupt changes in resource metrics or when anomalies are asymptomatic in both resource metrics and event logs. Guided by our experimental results, we propose and evaluate several actionable improvements, which include a change detection algorithm and the use of time windows on contextual anomaly detection. This technical report accompanies the paper “Contextual Anomaly Detection for a Critical Industrial System based on Logs and Metrics” [1] and provides further details on the analysis method, case study and experimental results

    Similar works