11 research outputs found

    Large Language Models Based Automatic Synthesis of Software Specifications

    Full text link
    Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

    Diagnosing Software Configuration Errors via Static Analysis

    Get PDF
    Software misconfiguration is responsible for a substantial part of today's system failures, causing about one quarter of all user-reported issues. Identifying their root causes can be costly in terms of time and human resources. To reduce the effort, researchers from industry and academia have developed many techniques to assist software engineers in troubleshooting software configuration. Unfortunately, there exist some challenges in applying these techniques to diagnose software misconfigurations considering that data or operations they require are difficult to achieve in practice. For instance, some techniques rely on a data base of configuration data, which is often not publicly available for reasons of data privacy. Some techniques heavily rely on runtime information of a failure run, which requires to reproduce a configuration error and rerun misconfigured systems. Reproducing a configuration error is costly since misconfiguration is highly relevant to operating environment. Some other techniques need testing oracles, which challenges ordinary end users. This thesis explores techniques for diagnosing configuration errors which can be deployed in practice. We develop techniques for troubleshooting software configuration, which rely on static analysis of a software system and do not need to execute the application. The source code and configuration documents of a system required by the techniques are often available, especially for open source software programs. Our techniques can be deployed as third-party services. The first technique addresses configuration errors due to erroneous option values. Our technique analyzes software programs and infer whether there exists an possible execution path from where an option value is loaded to the code location where the failure becomes visible. Options whose values might flow into such a crashing site are considered possible root causes of the error. Finally, we compute the correlation degrees of these options with the error using stack traces information of the error and rank them. The top-ranked options are more likely to be the root cause of the error. Our evaluation shows the technique is highly effective in diagnosing the root causes of configuration errors. The second technique automatically extracts names of options read by a program and their read points in the source code. We first identify statements loading option values, then infer which options are read by each statement, and finally output a map of these options and their read points. With the map, we are able to detect options in the documents which are not read by the corresponding version of the program. This allows locating configuration errors due to inconsistencies between configuration documents and source code. Our evaluation shows that the technique can precisely identify option read points and infer option names, and discovers multiple previously unknown inconsistencies between documented options and source code

    On Run-Time Configuration Engineering

    Get PDF
    De nos jours, les utilisateurs changent le comportement de leur logiciel et l’adaptent à différentes situations et contexts, sans avoir besoin d’aucune modifications du code source ou recompilation du logiciel. En effet, les utilisateurs utilisent le mécanisme de configuration qui offre un ensemble d’options modifiables par les utilisateurs. D’après plusieurs études, des mauvaises valeurs des options de configuration causent des erreurs difficiles à déboguer. Plusieurs compagnies importantes, comme Facebook, Google et Amazon ont rencontré des pannes et erreurs sérieuses à cause de la configuration et qui sont considérées parmi les plus pires pannes dans ces compagnies. En plus, plusieurs études ont trouvé que le mécanisme de configuration augmente la complexité des logiciels et les rend plus difficile à utiliser. Ces problèmes ont un sérieux impact sur plusieurs facteurs de qualité, comme la sécurité, l’exactitude, la disponibilité, la compréhensibilité, la maintenabilité, et la performance des logiciels. Plusieurs études ont été élaborées dans des aspects spécifiques dans l’ingénierie des configurations, dont la majorité se concentrent sur le débogage des défaillances de configuration et les tests de la configuration des logiciels, tandis que peu de recherches traitent les autres aspects de l’ingénierie des configurations de logiciel, comme la création et la maintenance des options de configuration. Par contre, nous pensons que la configuration des logiciels n’a pas seulement un impact sur l’exactitude d’un logiciel, mais peut avoir un impact sur d’autres métriques de qualité comme la compréhensibilité et la maintenabilité. Dans cette thèse, nous faisons d’abord un pas en arrière pour mieux comprendre les activités principales liées du processus de l’ingénierie des configurations, avant d’évaluer l’impact d’un catalogue de bonnes pratiques sur l’exactitude et la performance du processus de la configuration des logiciels. Pour ces raisons, nous avons conduit un ensemble d’études empiriques qualitatives et quantitatives sur des grands projets libres. On a conduit une étude qualitative en premier lieu, dans laquelle nous avons essayé de comprendre le processus de l’ingénierie de configuration, les enjeux et problèmes que les développeurs rencontrent durant ce processus, et qu’est ce que les développeurs et chercheurs proposent pour aider les développeurs à améliorer la qualité de l’ingénierie de la configuration logiciel. En réalisant 14 entrevues semi structurées, un sondage et une revue systématique de littérature, nous avons défini un processus de l’ingénierie de configuration invoquant 9 activités, un ensemble de 22 challenges rencontrés en pratique et 24 recommandations des experts.----------ABSTRACT: Modern software applications allow users to change the behavior of a software application and adapt it to different situations and contexts, without requiring any source code modifications or recompilations. To this end, applications leverage a wide range of mechanisms of software configuration that provide a set of options that can be changed by users. According to several studies, incorrect values of software configuration options cause severe errors that are hard-to-debug. Major companies such as Facebook, Google, and Amazon faced serious outages and failures due to configuration, which are considered as some of the worst outages in these companies. In addition, several studies found that the mechanism of software configuration increases the complexity of a software system and makes it hard to use. Such problems have a serious impact on different quality factors, such as security, correctness, availability, comprehensibility, maintainability, and performance of software systems. Several studies have been conducted on specific aspects of configuration engineering, with most of them focusing on debugging configuration failures and testing software configurations, while only few research efforts focused on other aspects of configuration engineering, such as the creation and maintenance of configuration options. However, we think that software configuration can not only have a negative impact on the correctness of a software system, but also on other quality metrics, such as its comprehensibility and maintainability. In this thesis, we first take a step back to better understand the main activities involved in the process of run-time configuration engineering, before evaluating the impact of a catalog of best practices on the correctness and performance of the configuration engineering process. For these purposes, we conducted several qualitative and quantitative empirical studies on large repositories and open source projects. We first conducted a qualitative study, in which we tried to understand the configuration engineering process, the challenges and problems developers face during this process, and what practitioners and researchers recommend to help developers to improve their software configuration engineering quality. By conducting 14 semi-structured interviews, a large survey, and a systematic literature review, we identified a process of configuration engineering involving 9 activities, a set of 22 challenges faced in practice, and a set of 24 recommendations by experts
    corecore