24 research outputs found

    Combining Spreadsheet Smells for Improved Fault Prediction

    Full text link
    Spreadsheets are commonly used in organizations as a programming tool for business-related calculations and decision making. Since faults in spreadsheets can have severe business impacts, a number of approaches from general software engineering have been applied to spreadsheets in recent years, among them the concept of code smells. Smells can in particular be used for the task of fault prediction. An analysis of existing spreadsheet smells, however, revealed that the predictive power of individual smells can be limited. In this work we therefore propose a machine learning based approach which combines the predictions of individual smells by using an AdaBoost ensemble classifier. Experiments on two public datasets containing real-world spreadsheet faults show significant improvements in terms of fault prediction accuracy.Comment: 4 pages, 1 figure, to be published in 40th International Conference on Software Engineering: New Ideas and Emerging Results Trac

    Detection of code smells using machine learning techniques combined with data-balancing methods

    Get PDF
    Code smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software maintenance. Previous studies have shown that code smell detection can be accomplished through the utilization of machine learning (ML) methods. However, despite their increasing popularity, research suggests that the suitability of these methods are not always appropriate due to the problem of imbalanced data. Consequently, the effectiveness of ML models may be negatively affected. This study aims to propose a novel method for detecting code smells by employing five ML algorithms, namely decision tree (DT), k-nearest neighbors (K-NN), support vector machine (SVM), XGboost (XGB), and multi-layer perceptron (MLP). Additionally, to tackle the challenge of imbalanced data, the proposed method incorporates the random oversampling technique. Experiments were conducted in this study using four datasets that encompassed code smells, specifically god-class, data-class, long-method, and feature-envy. The experimental outcomes were evaluated and compared using various performance metrics. Upon comparing the outcomes of our models on both the balanced and original datasets, we found that the XGB model achieved the highest accuracy of 100% for detecting the data class and long method on the original datasets. In contrast, the highest accuracy of 100% was obtained for the data class and long method using DT, SVM, and XGB models on the balanced datasets. According to the empirical findings, there is significant promise in using ML techniques for the accurate prediction of code smells

    Study of Code Smells: A Review and Research Agenda

    Get PDF
    Code Smells have been detected, predicted and studied by researchers from several perspectives. This literature review is conducted to understand tools and algorithms used to detect and analyze code smells to summarize research agenda. 114 studies have been selected from 2009 to 2022 to conduct this review. The studies are deeply analyzed under the categorization of machine learning and non-machine learning, which are found to be 25 and 89 respectively. The studies are analyzed to gain insight into algorithms, tools and limitations of the techniques. Long Method, Feature Envy, and Duplicate Code are reported to be the most popular smells. 38% of the studies focused their research on the enhancement of tools and methods. Random Forest and JRip algorithms are found to give the best results under machine learning techniques. We extended the previous studies on code smell detection tools, reporting a total 87 tools during the review. Java is found to be the dominant programming language during the study of smells

    A systematic literature review on the code smells datasets and validation mechanisms

    Full text link
    The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.Comment: 34 pages, 10 figures, 12 tables, Accepte

    A User Feedback Centric Approach for Detecting and Mitigating God Class Code Smell Using Frequent Usage Patterns

    Get PDF
    Code smells are the fragments in the source code that indicates deeper problems in the underlying software design. These code smells can hinder software evolution and maintenance. Out of different code smell types, the God Class (GC) code smell is one of the many important code smells that directly affects the software evolution and maintenance. The GC is commonly defined as a much larger class in systems that either know too much or do too much as compared to other classes in the system. God Classes are generally accidentally created overtime during software evolution because of the incremental addition of functionalities to it. Generally, a GC indicates a bad design choice and it must be detected and mitigated in order to enhance the quality of the underlying software. However, sometimes the presence of a GC is also considered a good design choice, especially in compiler design, interpreter design and parser implementation. This makes the developer’s feedback important for the correct classification of a class as a GC or a normal class. Therefore, this paper proposes a new approach that detects and proposes refactoring opportunities for GC code smell. The proposed approach makes use of different code metrics in combination along with utilizing user feedback as an important aspect while correctly identifying the GC code smell. The proposed approach that considers combined use of code metrics, is based on two newly proposed code metrics in this paper. The first newly proposed metric is a new approach of measuring the connectivity of a given class with other classes in the system (also termed as coupling). The second newly proposed code metric is proposed to measure the extent to which a given classes make use of foreign member variables. Finally, the proposed approach is also empirically evaluated on two standard open-source commonly used software systems. The obtained result indicates that the proposed approach is capable of correctly identifying the GC code smell
    corecore