Search CORE

2,152 research outputs found

AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges

Author: Cheng Qian
Hoi Steven C. H.
Liu Chenghao
Saha Amrita
Sahoo Doyen
Saverese Silvio
Singh Manpreet
Woo Gerald
Yang Wenzhuo
Publication venue
Publication date: 10/04/2023
Field of study

Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes, particularly in cloud infrastructures, to provide actionable insights with the primary goal of maximizing availability. There are a wide variety of problems to address, and multiple use-cases, where AI capabilities can be leveraged to enhance operational efficiency. Here we provide a review of the AIOps vision, trends challenges and opportunities, specifically focusing on the underlying AI techniques. We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful. We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions. We discuss the problem formulation for each task, and then present a taxonomy of techniques to solve these problems. We also identify relatively under explored topics, especially those that could significantly benefit from advances in AI literature. We also provide insights into the trends in this field, and what are the key investment opportunities

arXiv.org e-Print Archive

Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention

Author: Chen Zhuangbin
Lee Cheryl
Lyu Michael R.
Su Yuxin
Yang Tianyi
Yang Yongqiang
Publication venue
Publication date: 14/02/2023
Field of study

Prompt and accurate detection of system anomalies is essential to ensure the reliability of software systems. Unlike manual efforts that exploit all available run-time information, existing approaches usually leverage only a single type of monitoring data (often logs or metrics) or fail to make effective use of the joint information among different types of data. Consequently, many false predictions occur. To better understand the manifestations of system anomalies, we conduct a systematical study on a large amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates that logs and metrics can manifest system anomalies collaboratively and complementarily, and neither of them only is sufficient. Thus, integrating heterogeneous data can help recover the complete picture of a system's health status. In this context, we propose Hades, the first end-to-end semi-supervised approach to effectively identify system anomalies based on heterogeneous data. Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns. It captures discriminative features and meaningful interactions from heterogeneous data via a cross-modal attention module, trained in a semi-supervised manner. We evaluate Hades extensively on large-scale simulated data and datasets from Huawei Cloud. The experimental results present the effectiveness of our model in detecting system anomalies. We also release the code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). arXiv admin note: substantial text overlap with arXiv:2207.0291

arXiv.org e-Print Archive

Keeping Context In Mind: Automating Mobile App Access Control with User Interface Inspection

Author: Fu Hao
Mohapatra Pransant
Zheng Zizhan
Zhu Sencun
Publication venue
Publication date: 22/01/2019
Field of study

Recent studies observe that app foreground is the most striking component that influences the access control decisions in mobile platform, as users tend to deny permission requests lacking visible evidence. However, none of the existing permission models provides a systematic approach that can automatically answer the question: Is the resource access indicated by app foreground? In this work, we present the design, implementation, and evaluation of COSMOS, a context-aware mediation system that bridges the semantic gap between foreground interaction and background access, in order to protect system integrity and user privacy. Specifically, COSMOS learns from a large set of apps with similar functionalities and user interfaces to construct generic models that detect the outliers at runtime. It can be further customized to satisfy specific user privacy preference by continuously evolving with user decisions. Experiments show that COSMOS achieves both high precision and high recall in detecting malicious requests. We also demonstrate the effectiveness of COSMOS in capturing specific user preferences using the decisions collected from 24 users and illustrate that COSMOS can be easily deployed on smartphones as a real-time guard with a very low performance overhead.Comment: Accepted for publication in IEEE INFOCOM'201

arXiv.org e-Print Archive