9,518 research outputs found
Learning Effective Changes for Software Projects
The primary motivation of much of software analytics is decision making. How
to make these decisions? Should one make decisions based on lessons that arise
from within a particular project? Or should one generate these decisions from
across multiple projects? This work is an attempt to answer these questions.
Our work was motivated by a realization that much of the current generation
software analytics tools focus primarily on prediction. Indeed prediction is a
useful task, but it is usually followed by "planning" about what actions need
to be taken. This research seeks to address the planning task by seeking
methods that support actionable analytics that offer clear guidance on what to
do. Specifically, we propose XTREE and BELLTREE algorithms for generating a set
of actionable plans within and across projects. Each of these plans, if
followed will improve the quality of the software project.Comment: 4 pages, 2 figures. This a submission for ASE 2017 Doctoral Symposiu
Wasmizer: Curating WebAssembly-driven Projects on GitHub
WebAssembly has attracted great attention as a portable compilation target
for programming languages. To facilitate in-depth studies about this
technology, we have deployed Wasmizer, a tool that regularly mines GitHub
projects and makes an up-to-date dataset of WebAssembly sources and their
binaries publicly available. Presently, we have collected 2 540 C and C++
projects that are highly-related to WebAssembly, and built a dataset of 8 915
binaries that are linked to their source projects. To demonstrate an
application of this dataset, we have investigated the presence of eight
WebAssembly compilation smells in the wild.Comment: 11 pages + 1 page of references Preprint of MSR'23 publicatio
Are Multi-language Design Smells Fault-prone? An Empirical Study
Nowadays, modern applications are developed using components written in
different programming languages. These systems introduce several advantages.
However, as the number of languages increases, so does the challenges related
to the development and maintenance of these systems. In such situations,
developers may introduce design smells (i.e., anti-patterns and code smells)
which are symptoms of poor design and implementation choices. Design smells are
defined as poor design and coding choices that can negatively impact the
quality of a software program despite satisfying functional requirements.
Studies on mono-language systems suggest that the presence of design smells
affects code comprehension, thus making systems harder to maintain. However,
these studies target only mono-language systems and do not consider the
interaction between different programming languages. In this paper, we present
an approach to detect multi-language design smells in the context of JNI
systems. We then investigate the prevalence of those design smells.
Specifically, we detect 15 design smells in 98 releases of nine open-source JNI
projects. Our results show that the design smells are prevalent in the selected
projects and persist throughout the releases of the systems. We observe that in
the analyzed systems, 33.95% of the files involving communications between Java
and C/C++ contains occurrences of multi-language design smells. Some kinds of
smells are more prevalent than others, e.g., Unused Parameters, Too Much
Scattering, Unused Method Declaration. Our results suggest that files with
multi-language design smells can often be more associated with bugs than files
without these smells, and that specific smells are more correlated to
fault-proneness than others
Prevalence of Code Smells in Reinforcement Learning Projects
Reinforcement Learning (RL) is being increasingly used to learn and adapt
application behavior in many domains, including large-scale and safety critical
systems, as for example, autonomous driving. With the advent of plug-n-play RL
libraries, its applicability has further increased, enabling integration of RL
algorithms by users. We note, however, that the majority of such code is not
developed by RL engineers, which as a consequence, may lead to poor program
quality yielding bugs, suboptimal performance, maintainability, and evolution
problems for RL-based projects. In this paper we begin the exploration of this
hypothesis, specific to code utilizing RL, analyzing different projects found
in the wild, to assess their quality from a software engineering perspective.
Our study includes 24 popular RL-based Python projects, analyzed with standard
software engineering metrics. Our results, aligned with similar analyses for ML
code in general, show that popular and widely reused RL repositories contain
many code smells (3.95% of the code base on average), significantly affecting
the projects' maintainability. The most common code smells detected are long
method and long method chain, highlighting problems in the definition and
interaction of agents. Detected code smells suggest problems in responsibility
separation, and the appropriateness of current abstractions for the definition
of RL algorithms.Comment: Paper preprint for the 2nd International Conference on AI Engineering
Software Engineering for AI CAIN202
Evolution of technical debt remediation in Python: A case study on the Apache Software Ecosystem
In recent years, the evolution of software ecosystems and the detection of technical debt received significant attention by researchers from both industry and academia. While a few studies that analyze various aspects of technical debt evolution already exist, to the best of our knowledge, there is no large-scale study that focuses on the remediation of technical debt over time in Python projects -- i.e., one of the most popular programming languages at the moment. In this paper, we analyze the evolution of technical debt in 44 Python open-source software projects belonging to the Apache Software Foundation. We focus on the type and amount of technical debt that is paid back. The study required the mining of over 60K commits, detailed code analysis on 3.7K system versions, and the analysis of almost 43K fixed issues. The findings show that most of the repayment effort goes into testing, documentation, complexity and duplication removal. Moreover, more than half of the Python technical debt in the ecosystem is short-term being repaid in less than two months. In particular, the observations that a minority of rules account for the majority of issues fixed and spent effort, suggest that addressing those kinds of debt in the future is important for research and practice
Characterizing and Detecting Duplicate Logging Code Smells
Developers rely on software logs for a wide variety of tasks, such as debugging, testing, program comprehension, verification, and performance analysis. Despite the importance of logs, prior studies show that there is no industrial standard on how to write logging statements. Recent research on logs often only considers the appropriateness of a log as an individual item (e.g., one single logging statement); while logs are typically analyzed in tandem. In this thesis, we focus on studying duplicate logging statements, which are logging statements that have the same static text message. Such duplications in the text message are potential indications of logging code smells, which may affect developers’ understanding of the dynamic view of the system. We manually studied over 3K duplicate logging statements and their surrounding code in four large-scale open source systems: Hadoop, CloudStack, ElasticSearch, and Cassandra. We uncovered five patterns of duplicate logging code smells. For each instance of the code smell, we further manually identify the problematic (i.e., require fixes) and justifiable (i.e., do not require fixes) cases. Then, we contact developers in order to verify our manual study result. We integrated our manual study result and developers’ feedback into our automated static analysis tool, DLFinder, which automatically detects problematic duplicate logging code smells. We evaluated DLFinder on the four manually studied systems and four additional systems: Kafka, Flink, Camel and Wicket. In total, combining the results of DLFinder and our manual analysis, we reported 91 problematic code smell instances to developers and all of them have been fixed. This thesis provides an initial step on creating a logging guideline for developers to improve the quality of logging code. DLFinder is also able to detect duplicate logging code smells with high precision and recall
- …