47 research outputs found

    Configuration Validation with Large Language Models

    Full text link
    Misconfigurations are major causes of software failures. Existing practices rely on developer-written rules or test cases to validate configurations, which are expensive. Machine learning (ML) for configuration validation is considered a promising direction, but has been facing challenges such as the need of large-scale field data and system-specific models. Recent advances in Large Language Models (LLMs) show promise in addressing some of the long-lasting limitations of ML-based configuration validation. We present a first analysis on the feasibility and effectiveness of using LLMs for configuration validation. We empirically evaluate LLMs as configuration validators by developing a generic LLM-based configuration validation framework, named Ciri. Ciri employs effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data. Ciri checks outputs from LLMs when producing results, addressing hallucination and nondeterminism of LLMs. We evaluate Ciri's validation effectiveness on eight popular LLMs using configuration data of ten widely deployed open-source systems. Our analysis (1) confirms the potential of using LLMs for configuration validation, (2) explores design space of LLMbased validators like Ciri, and (3) reveals open challenges such as ineffectiveness in detecting certain types of misconfigurations and biases towards popular configuration parameters

    Automated Implementation of Windows-related Security-Configuration Guides

    Full text link
    Hardening is the process of configuring IT systems to ensure the security of the systems' components and data they process or store. The complexity of contemporary IT infrastructures, however, renders manual security hardening and maintenance a daunting task. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides

    Automatic Root Cause Analysis via Large Language Models for Cloud Incidents

    Full text link
    Ensuring the reliability and availability of cloud services necessitates efficient root cause analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual investigations of data sources such as logs and traces, are often laborious, error-prone, and challenging for on-call engineers. In this paper, we introduce RCACopilot, an innovative on-call system empowered by the large language model for automating RCA of cloud incidents. RCACopilot matches incoming incidents to corresponding incident handlers based on their alert types, aggregates the critical runtime diagnostic information, predicts the incident's root cause category, and provides an explanatory narrative. We evaluate RCACopilot using a real-world dataset consisting of a year's worth of incidents from Microsoft. Our evaluation demonstrates that RCACopilot achieves RCA accuracy up to 0.766. Furthermore, the diagnostic information collection component of RCACopilot has been successfully in use at Microsoft for over four years

    Hardening Cloud and Datacenter Systems against Misconfigurations: Principles and Tool Support

    No full text
    Misconfigurations (a.k.a., configuration errors from a system’s standpoint) are among the dominant causes of today’s catastrophic system failures that turn down cloud-scale services and affect hundreds of millions of end users. Despite their wide adoption, traditional fault-tolerance and failure-recovery techniques are not effective in dealing with configuration errors, especially in large-scale software systems deployed in cloud and datacenters. To make the matters worse, even the tolerance and recovery mechanisms themselves are often misconfigured in the real world, which impairs the immune system of the entire cloud and datacenters.This dissertation explores two fundamental questions towards the solutions for the inevitable misconfigurations—how to build reliable cloud and datacenter systems in the face of configuration errors; moreover, how to prevent misconfigurations in the first place by better configuration design. The goal is to enable software systems to proactively anticipate and defend against misconfigurations, rather than reacting to their manifestations and consequences. This dissertation presents three key principles of systems design and implementation for hardening cloud and datacenter systems against misconfigurations—anticipating misconfigurations, early detection of configuration errors, and simplicity-oriented configuration design. The dissertation demonstrates that applying these principles can effectively defend cloud and datacenter systems against misconfigurations. Moreover, the dissertation presents the corresponding techniques and tool support that can automatically and systematically apply these principles to existing systems software. The main technical insight is that configurations are essentially used by the systems, while configuration errors are mostly manifested through the faulty execution that uses erroneous configuration values. Therefore, by analyzing the system’s code that usesconfiguration values, one can understand and make use of system-level information of configurations to build defense against potential errors. This dissertation first presents Spex that enables systems to anticipate misconfigurations. Spex automatically infers configuration constraints from a system’s source code, and then leverages the constraints to test the system’s resilience to misconfigurations and detect error-prone configuration design/handling. On step further, the dissertation introduces PCheck to automatically generate checking code which captures configuration errors at the system’s initialization phase to prevent their late manifestations and the corresponding failure damage.Going beyond, this dissertation presents simplicity-oriented configuration design towards more usable and less error-prone software configuration. The key idea is to apply the user-centric philosophy to design configuration as an interface—configurations are essentially the interface for controlling and customizing system behavior, but have rarely been treated as it is. The dissertation shows that configurations in today’s systems software can be significantly simplified and effectively navigated, with the understanding of how they are actually used in the field
    corecore