24 research outputs found
Combining Spreadsheet Smells for Improved Fault Prediction
Spreadsheets are commonly used in organizations as a programming tool for
business-related calculations and decision making. Since faults in spreadsheets
can have severe business impacts, a number of approaches from general software
engineering have been applied to spreadsheets in recent years, among them the
concept of code smells. Smells can in particular be used for the task of fault
prediction. An analysis of existing spreadsheet smells, however, revealed that
the predictive power of individual smells can be limited. In this work we
therefore propose a machine learning based approach which combines the
predictions of individual smells by using an AdaBoost ensemble classifier.
Experiments on two public datasets containing real-world spreadsheet faults
show significant improvements in terms of fault prediction accuracy.Comment: 4 pages, 1 figure, to be published in 40th International Conference
on Software Engineering: New Ideas and Emerging Results Trac
Detection of code smells using machine learning techniques combined with data-balancing methods
Code smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software maintenance. Previous studies have shown that code smell detection can be accomplished through the utilization of machine learning (ML) methods. However, despite their increasing popularity, research suggests that the suitability of these methods are not always appropriate due to the problem of imbalanced data. Consequently, the effectiveness of ML models may be negatively affected. This study aims to propose a novel method for detecting code smells by employing five ML algorithms, namely decision tree (DT), k-nearest neighbors (K-NN), support vector machine (SVM), XGboost (XGB), and multi-layer perceptron (MLP). Additionally, to tackle the challenge of imbalanced data, the proposed method incorporates the random oversampling technique. Experiments were conducted in this study using four datasets that encompassed code smells, specifically god-class, data-class, long-method, and feature-envy. The experimental outcomes were evaluated and compared using various performance metrics. Upon comparing the outcomes of our models on both the balanced and original datasets, we found that the XGB model achieved the highest accuracy of 100% for detecting the data class and long method on the original datasets. In contrast, the highest accuracy of 100% was obtained for the data class and long method using DT, SVM, and XGB models on the balanced datasets. According to the empirical findings, there is significant promise in using ML techniques for the accurate prediction of code smells
Study of Code Smells: A Review and Research Agenda
Code Smells have been detected, predicted and studied by researchers from several perspectives. This literature review is conducted to understand tools and algorithms used to detect and analyze code smells to summarize research agenda. 114 studies have been selected from 2009 to 2022 to conduct this review. The studies are deeply analyzed under the categorization of machine learning and non-machine learning, which are found to be 25 and 89 respectively. The studies are analyzed to gain insight into algorithms, tools and limitations of the techniques. Long Method, Feature Envy, and Duplicate Code are reported to be the most popular smells. 38% of the studies focused their research on the enhancement of tools and methods. Random Forest and JRip algorithms are found to give the best results under machine learning techniques. We extended the previous studies on code smell detection tools, reporting a total 87 tools during the review. Java is found to be the dominant programming language during the study of smells
A systematic literature review on the code smells datasets and validation mechanisms
The accuracy reported for code smell-detecting tools varies depending on the
dataset used to evaluate the tools. Our survey of 45 existing datasets reveals
that the adequacy of a dataset for detecting smells highly depends on relevant
properties such as the size, severity level, project types, number of each type
of smell, number of smells, and the ratio of smelly to non-smelly samples in
the dataset. Most existing datasets support God Class, Long Method, and Feature
Envy while six smells in Fowler and Beck's catalog are not supported by any
datasets. We conclude that existing datasets suffer from imbalanced samples,
lack of supporting severity level, and restriction to Java language.Comment: 34 pages, 10 figures, 12 tables, Accepte
A User Feedback Centric Approach for Detecting and Mitigating God Class Code Smell Using Frequent Usage Patterns
Code smells are the fragments in the source code that indicates deeper problems in the underlying software design. These code smells can hinder software evolution and maintenance. Out of different code smell types, the God Class (GC) code smell is one of the many important code smells that directly affects the software evolution and maintenance. The GC is commonly defined as a much larger class in systems that either know too much or do too much as compared to other classes in the system. God Classes are generally accidentally created overtime during software evolution because of the incremental addition of functionalities to it. Generally, a GC indicates a bad design choice and it must be detected and mitigated in order to enhance the quality of the underlying software. However, sometimes the presence of a GC is also considered a good design choice, especially in compiler design, interpreter design and parser implementation. This makes the developer’s feedback important for the correct classification of a class as a GC or a normal class. Therefore, this paper proposes a new approach that detects and proposes refactoring opportunities for GC code smell. The proposed approach makes use of different code metrics in combination along with utilizing user feedback as an important aspect while correctly identifying the GC code smell. The proposed approach that considers combined use of code metrics, is based on two newly proposed code metrics in this paper. The first newly proposed metric is a new approach of measuring the connectivity of a given class with other classes in the system (also termed as coupling). The second newly proposed code metric is proposed to measure the extent to which a given classes make use of foreign member variables. Finally, the proposed approach is also empirically evaluated on two standard open-source commonly used software systems. The obtained result indicates that the proposed approach is capable of correctly identifying the GC code smell