2,152 research outputs found
AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges
Artificial Intelligence for IT operations (AIOps) aims to combine the power
of AI with the big data generated by IT Operations processes, particularly in
cloud infrastructures, to provide actionable insights with the primary goal of
maximizing availability. There are a wide variety of problems to address, and
multiple use-cases, where AI capabilities can be leveraged to enhance
operational efficiency. Here we provide a review of the AIOps vision, trends
challenges and opportunities, specifically focusing on the underlying AI
techniques. We discuss in depth the key types of data emitted by IT Operations
activities, the scale and challenges in analyzing them, and where they can be
helpful. We categorize the key AIOps tasks as - incident detection, failure
prediction, root cause analysis and automated actions. We discuss the problem
formulation for each task, and then present a taxonomy of techniques to solve
these problems. We also identify relatively under explored topics, especially
those that could significantly benefit from advances in AI literature. We also
provide insights into the trends in this field, and what are the key investment
opportunities
Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention
Prompt and accurate detection of system anomalies is essential to ensure the
reliability of software systems. Unlike manual efforts that exploit all
available run-time information, existing approaches usually leverage only a
single type of monitoring data (often logs or metrics) or fail to make
effective use of the joint information among different types of data.
Consequently, many false predictions occur. To better understand the
manifestations of system anomalies, we conduct a systematical study on a large
amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates
that logs and metrics can manifest system anomalies collaboratively and
complementarily, and neither of them only is sufficient. Thus, integrating
heterogeneous data can help recover the complete picture of a system's health
status. In this context, we propose Hades, the first end-to-end semi-supervised
approach to effectively identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global
representation of the system status by fusing log semantics and metric
patterns. It captures discriminative features and meaningful interactions from
heterogeneous data via a cross-modal attention module, trained in a
semi-supervised manner. We evaluate Hades extensively on large-scale simulated
data and datasets from Huawei Cloud. The experimental results present the
effectiveness of our model in detecting system anomalies. We also release the
code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on
Software Engineering (ICSE). arXiv admin note: substantial text overlap with
arXiv:2207.0291
Keeping Context In Mind: Automating Mobile App Access Control with User Interface Inspection
Recent studies observe that app foreground is the most striking component
that influences the access control decisions in mobile platform, as users tend
to deny permission requests lacking visible evidence. However, none of the
existing permission models provides a systematic approach that can
automatically answer the question: Is the resource access indicated by app
foreground? In this work, we present the design, implementation, and evaluation
of COSMOS, a context-aware mediation system that bridges the semantic gap
between foreground interaction and background access, in order to protect
system integrity and user privacy. Specifically, COSMOS learns from a large set
of apps with similar functionalities and user interfaces to construct generic
models that detect the outliers at runtime. It can be further customized to
satisfy specific user privacy preference by continuously evolving with user
decisions. Experiments show that COSMOS achieves both high precision and high
recall in detecting malicious requests. We also demonstrate the effectiveness
of COSMOS in capturing specific user preferences using the decisions collected
from 24 users and illustrate that COSMOS can be easily deployed on smartphones
as a real-time guard with a very low performance overhead.Comment: Accepted for publication in IEEE INFOCOM'201
- …