24 research outputs found
Combining Spreadsheet Smells for Improved Fault Prediction
Spreadsheets are commonly used in organizations as a programming tool for
business-related calculations and decision making. Since faults in spreadsheets
can have severe business impacts, a number of approaches from general software
engineering have been applied to spreadsheets in recent years, among them the
concept of code smells. Smells can in particular be used for the task of fault
prediction. An analysis of existing spreadsheet smells, however, revealed that
the predictive power of individual smells can be limited. In this work we
therefore propose a machine learning based approach which combines the
predictions of individual smells by using an AdaBoost ensemble classifier.
Experiments on two public datasets containing real-world spreadsheet faults
show significant improvements in terms of fault prediction accuracy.Comment: 4 pages, 1 figure, to be published in 40th International Conference
on Software Engineering: New Ideas and Emerging Results Trac
Definitions of a Software Smell
Many authors have defined smells from their perspective. This document attempts to provide a consolidated list of such definitions
Detailed Overview of Software Smells
This document provides an overview of literature concerning software smells covering various dimensions of smells along with their corresponding references
On the Feasibility of Transfer-learning Code Smells using Deep Learning
Context: A substantial amount of work has been done to detect smells in
source code using metrics-based and heuristics-based methods. Machine learning
methods have been recently applied to detect source code smells; however, the
current practices are considered far from mature. Objective: First, explore the
feasibility of applying deep learning models to detect smells without extensive
feature engineering, just by feeding the source code in tokenized form. Second,
investigate the possibility of applying transfer-learning in the context of
deep learning models for smell detection. Method: We use existing metric-based
state-of-the-art methods for detecting three implementation smells and one
design smell in C# code. Using these results as the annotated gold standard, we
train smell detection models on three different deep learning architectures.
These architectures use Convolution Neural Networks (CNNs) of one or two
dimensions, or Recurrent Neural Networks (RNNs) as their principal hidden
layers. For the first objective of our study, we perform training and
evaluation on C# samples, whereas for the second objective, we train the models
from C# code and evaluate the models over Java code samples. We perform the
experiments with various combinations of hyper-parameters for each model.
Results: We find it feasible to detect smells using deep learning methods. Our
comparative experiments find that there is no clearly superior method between
CNN-1D and CNN-2D. We also observe that performance of the deep learning models
is smell-specific. Our transfer-learning experiments show that
transfer-learning is definitely feasible for implementation smells with
performance comparable to that of direct-learning. This work opens up a new
paradigm to detect code smells by transfer-learning especially for the
programming languages where the comprehensive code smell detection tools are
not available
FedCSD: A Federated Learning Based Approach for Code-Smell Detection
This paper proposes a Federated Learning Code Smell Detection (FedCSD)
approach that allows organizations to collaboratively train federated ML models
while preserving their data privacy. These assertions have been supported by
three experiments that have significantly leveraged three manually validated
datasets aimed at detecting and examining different code smell scenarios. In
experiment 1, which was concerned with a centralized training experiment,
dataset two achieved the lowest accuracy (92.30%) with fewer smells, while
datasets one and three achieved the highest accuracy with a slight difference
(98.90% and 99.5%, respectively). This was followed by experiment 2, which was
concerned with cross-evaluation, where each ML model was trained using one
dataset, which was then evaluated over the other two datasets. Results from
this experiment show a significant drop in the model's accuracy (lowest
accuracy: 63.80\%) where fewer smells exist in the training dataset, which has
a noticeable reflection (technical debt) on the model's performance. Finally,
the last and third experiments evaluate our approach by splitting the dataset
into 10 companies. The ML model was trained on the company's site, then all
model-updated weights were transferred to the server. Ultimately, an accuracy
of 98.34% was achieved by the global model that has been trained using 10
companies for 100 training rounds. The results reveal a slight difference in
the global model's accuracy compared to the highest accuracy of the centralized
model, which can be ignored in favour of the global model's comprehensive
knowledge, lower training cost, preservation of data privacy, and avoidance of
the technical debt problem.Comment: 17 pages, 7 figures, Journal pape