50 research outputs found

    The Love/Hate Relationship with the C Preprocessor: An Interview Study

    Get PDF
    The C preprocessor has received strong criticism in academia, among others regarding separation of concerns, error proneness, and code obfuscation, but is widely used in practice. Many (mostly academic) alternatives to the preprocessor exist, but have not been adopted in practice. Since developers continue to use the preprocessor despite all criticism and research, we ask how practitioners perceive the C preprocessor. We performed interviews with 40 developers, used grounded theory to analyze the data, and cross-validated the results with data from a survey among 202 developers, repository mining, and results from previous studies. In particular, we investigated four research questions related to why the preprocessor is still widely used in practice, common problems, alternatives, and the impact of undisciplined annotations. Our study shows that developers are aware of the criticism the C preprocessor receives, but use it nonetheless, mainly for portability and variability. Many developers indicate that they regularly face preprocessor-related problems and preprocessor-related bugs. The majority of our interviewees do not see any current C-native technologies that can entirely replace the C preprocessor. However, developers tend to mitigate problems with guidelines, even though those guidelines are not enforced consistently. We report the key insights gained from our study and discuss implications for practitioners and researchers on how to better use the C preprocessor to minimize its negative impact

    A Data Set of Generalizable Python Code Change Patterns

    Full text link
    Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such tools and datasets exist for Java, there is little work on finding and recommending such changes in Python. In this paper, we present a data set of manually vetted generalizable Python repetitive code change patterns. We create a coding guideline to identify generalizable change patterns that can be used in automated tooling. We leverage the mined change patterns from recent work that mines repetitive changes in Python projects and use our coding guideline to manually review the patterns. For each change, we also record a description of the change and why it is applied along with other characteristics such as the number of projects it occurs in. This review process allows us to identify and share 72 Python change patterns that can be used to build and advance Python developer support tools

    DRACA: Decision-support for Root Cause Analysis and Change Impact Analysis

    Get PDF
    Most companies relying on an Information Technology (IT) system for their daily operations heavily invest in its maintenance. Tools that monitor network traffic, record anomalies and keep track of the changes that occur in the system are usually used. Root cause analysis and change impact analysis are two main activities involved in the management of IT systems. Currently, there exists no universal model to guide analysts while performing these activities. Although the Information Technology Infrastructure Library (ITIL) provides a guide to the or- ganization and structure of the tools and processes used to manage IT systems, it does not provide any models that can be used to implement the required features. This thesis focuses on providing simple and effective models and processes for root cause analysis and change impact analysis through mining useful artifacts stored in a Confguration Management Database (CMDB). The CMDB contains information about the different components in a system, called Confguration Items (CIs), as well as the relationships between them. Change reports and incident reports are also stored in a CMDB. The result of our work is the Decision support for Root cause Analysis and Change impact Analysis (DRACA) framework which suggests possible root cause(s) of a problem, as well as possible CIs involved in a change set based on di erent proposed models. The contributions of this thesis are as follows: - An exploration of data repositories (CMDBs) that have not been previously attempted in the mining software repositories research community. - A causality model providing decision support for root cause analysis based on this mined data. - A process for mining historical change information to suggest CIs for future change sets based on a ranking model. Support and con dence measures are used to make the suggestions. - Empirical results from applying the proposed change impact analysis process to industrial data. Our results show that the change sets in the CMDB were highly predictive, and that with a confidence threshold of 80% and a half life of 12 months, an overall recall of 69.8% and a precision of 88.5% were achieved. - An overview of lessons learned from using a CMDB, and the observations we made while working with the CMDB

    Variability Anomalies in Software Product Lines

    Get PDF
    Software Product Lines (SPLs) allow variants of a software system to be generated based on the configuration selected by the user. In this thesis, we focus on C based software systems with build-time variability using a build system and C preprocessor. Such systems usually consist of a configuration space, a code space, and a build space. The configuration space describes the features that the user can select and any configuration constraints between them. The features and the constraints between them are commonly documented in a variability model. The code and build spaces contain the actual implementation of the system where the former contains the C code files with conditional compilation directives (e.g., #ifdefs), and the latter contains the build scripts with conditionally compiled files. We study the relationship between the three spaces as follows: (1) we detect variability anomalies which arise due to inconsistencies among the three spaces, and (2) we use anomaly detection techniques to automatically extract configuration constraints from the implementation. For (1), we complement previous research which mainly focused on the relationship between the configuration space and code space. We additionally analyze the build space to ensure that the constraints in all three spaces are consistent. We detect inconsistencies, which we call variability anomalies, in particular dead and undead artifacts. Dead artifacts are conditional artifacts which are not included in any valid configuration while undead artifacts are those which are always included. We look for such anomalies at both the code block and source file levels using the Linux kernel as a case study. Our work shows that almost half the configurable features are only used to control source file compilation in Linux’s build system, KBUILD . We analyze KBUILD to extract file presence conditions which determine under which feature combinations is each file compiled. We show that by considering the build system, we can detect an additional 20% variability anomalies on the code block level when compared to only using the configuration and code spaces. Our work also shows that file level anomalies occur less frequently than block level ones. We analyze the evolution of the detected anomalies and identify some of their causes and fixes. For (2), we develop novel analyses to automatically extract configuration constraints from implementation and compare them to those in existing variability models. We rely on two means of detecting variability anomalies: (a) conditional build-time errors and (b) detecting under which conditions a feature has an effect on the compiled code (to avoid duplicate variants). We apply this to four real-world systems: uClibc, BusyBox, eCos, and the Linux kernel. We show that our extraction is 93% and 77% accurate respectively for the two means we use and that we can recover 19 % of the existing variability-model constraints using our approach. We qualitatively investigate the non-recovered constraints and find that many of them stem from domain knowledge. For systems with existing variability models, understanding where each constraint comes from can aid in traceability between the code and the model which can help in debugging conflicts. More importantly, in systems which do not have a formal variability model, automatically extracting constraints from code provides the basis for reverse engineering a variability model. Overall, we provide tools and techniques to help maintain and create software product lines. Our work helps to ensure the consistency of variability constraints scattered across SPLs and provides tools to help reverse engineer variability models

    Reuse and maintenance practices among divergent forks in three software ecosystems

    Get PDF
    With the rise of social coding platforms that rely on distributed version control systems, software reuse is also on the rise. Many software developers leverage this reuse by creating variants through forking, to account for different customer needs, markets, or environments. Forked variants then form a so-called software family; they share a common code base and are maintained in parallel by same or different developers. As such, software families can easily arise within software ecosystems, which are large collections of interdependent software components maintained by communities of collaborating contributors. However, little is known about the existence and characteristics of such families within ecosystems, especially about their maintenance practices. Improving our empirical understanding of such families will help build better tools for maintaining and evolving such families. We empirically explore maintenance practices in such fork-based software families within ecosystems of open-source software. Our focus is on three of the largest software ecosystems existence today: Android,.NET, and JavaScript. We identify and analyze software families that are maintained together and that exist both on the official distribution platform (Google play, nuget, and npm) as well as on GitHub , allowing us to analyze reuse practices in depth. We mine and identify 38 software families, 526 software families, and 8,837 software families from the ecosystems of Android,.NET, and JavaScript, to study their characteristics and code-propagation practices. We provide scripts for analyzing code integration within our families. Interestingly, our results show that there is little code integration across the studied software families from the three ecosystems. Our studied families also show that techniques of direct integration using git outside of GitHub is more commonly used than GitHub pull requests. Overall, we hope to raise awareness about the existence of software families within larger ecosystems of software, calling for further research and better tools support to effectively maintain and evolve them

    An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

    Full text link
    Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to this problem, utilizing additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness of LLMs for automated unit test generation without additional training or manual effort, providing the LLM with the signature and implementation of the function under test, along with usage examples extracted from documentation. We also attempt to repair failed generated tests by re-prompting the model with the failing test and error message. We implement our approach in TestPilot, a test generation tool for JavaScript that automatically generates unit tests for all API functions in an npm package. We evaluate TestPilot using OpenAI's gpt3.5-turbo LLM on 25 npm packages with a total of 1,684 API functions. The generated tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%, significantly improving on Nessie, a recent feedback-directed JavaScript test generation technique, which achieves only 51.3% statement coverage and 25.6% branch coverage. We also find that 92.8% of TestPilot's generated tests have no more than 50% similarity with existing tests (as measured by normalized edit distance), with none of them being exact copies. Finally, we run TestPilot with two additional LLMs, OpenAI's older code-cushman-002 LLM and the open LLM StarCoder. Overall, we observed similar results with the former (68.2% median statement coverage), and somewhat worse results with the latter (54.0% median statement coverage), suggesting that the effectiveness of the approach is influenced by the size and training set of the LLM, but does not fundamentally depend on the specific model

    Développer les compétences transversales pour engager les étudiants dans un APP de spécialité : retour d’expérience

    Get PDF
    À l’IUT 1 de Grenoble, les enseignantes de thermodynamique et d’expression-communication ont introduit une dimension interdisciplinaire à un enseignement organisé sous forme d’apprentissage par problèmes (APP) pour répondre aux difficultés d’engagement des étudiants. Le point central a été d’accompagner la montée en compétences des étudiants sur le travail de groupe. Parallèlement, une réflexion a été menée sur les ressources et leur mise à disposition ainsi que sur l’organisation de l’environnement de travail. Ces actions ont permis de tirer un premier bilan ouvrant des perspectives dans un contexte de développement de l’approche par compétences dans les Instituts Universitaires de Technologie (IUT) et à l’Université en général
    corecore