9,518 research outputs found

    Learning Effective Changes for Software Projects

    Full text link
    The primary motivation of much of software analytics is decision making. How to make these decisions? Should one make decisions based on lessons that arise from within a particular project? Or should one generate these decisions from across multiple projects? This work is an attempt to answer these questions. Our work was motivated by a realization that much of the current generation software analytics tools focus primarily on prediction. Indeed prediction is a useful task, but it is usually followed by "planning" about what actions need to be taken. This research seeks to address the planning task by seeking methods that support actionable analytics that offer clear guidance on what to do. Specifically, we propose XTREE and BELLTREE algorithms for generating a set of actionable plans within and across projects. Each of these plans, if followed will improve the quality of the software project.Comment: 4 pages, 2 figures. This a submission for ASE 2017 Doctoral Symposiu

    Wasmizer: Curating WebAssembly-driven Projects on GitHub

    Full text link
    WebAssembly has attracted great attention as a portable compilation target for programming languages. To facilitate in-depth studies about this technology, we have deployed Wasmizer, a tool that regularly mines GitHub projects and makes an up-to-date dataset of WebAssembly sources and their binaries publicly available. Presently, we have collected 2 540 C and C++ projects that are highly-related to WebAssembly, and built a dataset of 8 915 binaries that are linked to their source projects. To demonstrate an application of this dataset, we have investigated the presence of eight WebAssembly compilation smells in the wild.Comment: 11 pages + 1 page of references Preprint of MSR'23 publicatio

    Are Multi-language Design Smells Fault-prone? An Empirical Study

    Full text link
    Nowadays, modern applications are developed using components written in different programming languages. These systems introduce several advantages. However, as the number of languages increases, so does the challenges related to the development and maintenance of these systems. In such situations, developers may introduce design smells (i.e., anti-patterns and code smells) which are symptoms of poor design and implementation choices. Design smells are defined as poor design and coding choices that can negatively impact the quality of a software program despite satisfying functional requirements. Studies on mono-language systems suggest that the presence of design smells affects code comprehension, thus making systems harder to maintain. However, these studies target only mono-language systems and do not consider the interaction between different programming languages. In this paper, we present an approach to detect multi-language design smells in the context of JNI systems. We then investigate the prevalence of those design smells. Specifically, we detect 15 design smells in 98 releases of nine open-source JNI projects. Our results show that the design smells are prevalent in the selected projects and persist throughout the releases of the systems. We observe that in the analyzed systems, 33.95% of the files involving communications between Java and C/C++ contains occurrences of multi-language design smells. Some kinds of smells are more prevalent than others, e.g., Unused Parameters, Too Much Scattering, Unused Method Declaration. Our results suggest that files with multi-language design smells can often be more associated with bugs than files without these smells, and that specific smells are more correlated to fault-proneness than others

    Prevalence of Code Smells in Reinforcement Learning Projects

    Full text link
    Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects' maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.Comment: Paper preprint for the 2nd International Conference on AI Engineering Software Engineering for AI CAIN202

    Evolution of technical debt remediation in Python: A case study on the Apache Software Ecosystem

    Get PDF
    In recent years, the evolution of software ecosystems and the detection of technical debt received significant attention by researchers from both industry and academia. While a few studies that analyze various aspects of technical debt evolution already exist, to the best of our knowledge, there is no large-scale study that focuses on the remediation of technical debt over time in Python projects -- i.e., one of the most popular programming languages at the moment. In this paper, we analyze the evolution of technical debt in 44 Python open-source software projects belonging to the Apache Software Foundation. We focus on the type and amount of technical debt that is paid back. The study required the mining of over 60K commits, detailed code analysis on 3.7K system versions, and the analysis of almost 43K fixed issues. The findings show that most of the repayment effort goes into testing, documentation, complexity and duplication removal. Moreover, more than half of the Python technical debt in the ecosystem is short-term being repaid in less than two months. In particular, the observations that a minority of rules account for the majority of issues fixed and spent effort, suggest that addressing those kinds of debt in the future is important for research and practice

    Characterizing and Detecting Duplicate Logging Code Smells

    Get PDF
    Developers rely on software logs for a wide variety of tasks, such as debugging, testing, program comprehension, verification, and performance analysis. Despite the importance of logs, prior studies show that there is no industrial standard on how to write logging statements. Recent research on logs often only considers the appropriateness of a log as an individual item (e.g., one single logging statement); while logs are typically analyzed in tandem. In this thesis, we focus on studying duplicate logging statements, which are logging statements that have the same static text message. Such duplications in the text message are potential indications of logging code smells, which may affect developers’ understanding of the dynamic view of the system. We manually studied over 3K duplicate logging statements and their surrounding code in four large-scale open source systems: Hadoop, CloudStack, ElasticSearch, and Cassandra. We uncovered five patterns of duplicate logging code smells. For each instance of the code smell, we further manually identify the problematic (i.e., require fixes) and justifiable (i.e., do not require fixes) cases. Then, we contact developers in order to verify our manual study result. We integrated our manual study result and developers’ feedback into our automated static analysis tool, DLFinder, which automatically detects problematic duplicate logging code smells. We evaluated DLFinder on the four manually studied systems and four additional systems: Kafka, Flink, Camel and Wicket. In total, combining the results of DLFinder and our manual analysis, we reported 91 problematic code smell instances to developers and all of them have been fixed. This thesis provides an initial step on creating a logging guideline for developers to improve the quality of logging code. DLFinder is also able to detect duplicate logging code smells with high precision and recall
    • …
    corecore