Robust Multimodal Failure Detection for Microservice Systems

Abstract

Proactive failure detection of instances is vitally essential to microservice systems because an instance failure can propagate to the whole system and degrade the system's performance. Over the years, many single-modal (i.e., metrics, logs, or traces) data-based nomaly detection methods have been proposed. However, they tend to miss a large number of failures and generate numerous false alarms because they ignore the correlation of multimodal data. In this work, we propose AnoFusion, an unsupervised failure detection approach, to proactively detect instance failures through multimodal data for microservice systems. It applies a Graph Transformer Network (GTN) to learn the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) with Gated Recurrent Unit (GRU) to address the challenges introduced by dynamically changing multimodal data. We evaluate the performance of AnoFusion through two datasets, demonstrating that it achieves the F1-score of 0.857 and 0.922, respectively, outperforming the state-of-the-art failure detection approaches

    Similar works

    Full text

    thumbnail-image

    Available Versions