364 research outputs found
A systematic literature review on the code smells datasets and validation mechanisms
The accuracy reported for code smell-detecting tools varies depending on the
dataset used to evaluate the tools. Our survey of 45 existing datasets reveals
that the adequacy of a dataset for detecting smells highly depends on relevant
properties such as the size, severity level, project types, number of each type
of smell, number of smells, and the ratio of smelly to non-smelly samples in
the dataset. Most existing datasets support God Class, Long Method, and Feature
Envy while six smells in Fowler and Beck's catalog are not supported by any
datasets. We conclude that existing datasets suffer from imbalanced samples,
lack of supporting severity level, and restriction to Java language.Comment: 34 pages, 10 figures, 12 tables, Accepte
Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques
Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio
Technical Debt Prioritization: State of the Art. A Systematic Literature Review
Background. Software companies need to manage and refactor Technical Debt
issues. Therefore, it is necessary to understand if and when refactoring
Technical Debt should be prioritized with respect to developing features or
fixing bugs. Objective. The goal of this study is to investigate the existing
body of knowledge in software engineering to understand what Technical Debt
prioritization approaches have been proposed in research and industry. Method.
We conducted a Systematic Literature Review among 384 unique papers published
until 2018, following a consolidated methodology applied in Software
Engineering. We included 38 primary studies. Results. Different approaches have
been proposed for Technical Debt prioritization, all having different goals and
optimizing on different criteria. The proposed measures capture only a small
part of the plethora of factors used to prioritize Technical Debt qualitatively
in practice. We report an impact map of such factors. However, there is a lack
of empirical and validated set of tools. Conclusion. We observed that technical
Debt prioritization research is preliminary and there is no consensus on what
are the important factors and how to measure them. Consequently, we cannot
consider current research conclusive and in this paper, we outline different
directions for necessary future investigations
State of Refactoring Adoption: Towards Better Understanding Developer Perception of Refactoring
Context: Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. Objective: We aim at exploring how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring (SAR), which is an indication of the developer-related refactoring events in the commit messages. After that, we propose an approach to identify whether a commit describes developer-related refactoring events, to classify them according to the refactoring common quality improvement categories. To complement this goal, we aim to reveal insights into how reviewers develop a decision about accepting or rejecting a submitted refactoring request, what makes such review challenging, and how to the efficiency of refactoring code review. Method: Our empirically driven study follows a mixture of qualitative and quantitative methods. We text mine refactoring-related documentation, then we develop a refactoring taxonomy, and automatically classify a large set of commits containing refactoring activities, and identify, among the various quality models presented in the literature, the ones that are more in-line with the developer\u27s vision of quality optimization, when they explicitly mention that they are refactoring to improve them to obtain an enhanced understanding of the motivation behind refactoring. After that, we performed an industrial case study with professional developers at Xerox to study the motivations, documentation practices, challenges, verification, and implications of refactoring activities during code review. Result: We introduced SAR taxonomy on how developers document their refactoring strategies in commit messages and proposed a SAR model to automate the detection of refactoring. Our survey with code reviewers has revealed several difficulties related to understanding the refactoring intent and implications on the functional and non-functional aspects of the software. Conclusion: Our SAR taxonomy and model, can work in conjunction with refactoring detectors, to report any early inconsistency between refactoring types and their documentation and can serve as a solid background for various empirical investigations. In light of our findings of the industrial case study, we recommended a procedure to properly document refactoring activities, as part of our survey feedback
Moving beyond Deletions: Program Simplification via Diverse Program Transformations
To reduce the complexity of software, Developers manually simplify program
(known as developer-induced program simplification in this paper) to reduce its
code size yet preserving its functionality but manual simplification is
time-consuming and error-prone. To reduce manual effort, rule-based approaches
(e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can
be potentially applied to automate developer-induced program simplification.
However, as there is little study on how developers simplify programs in
Open-source Software (OSS) projects, it is unclear whether these approaches can
be effectively used for developer-induced program simplification. Hence, we
present the first study of developer-induced program simplification in OSS
projects, focusing on the types of program transformations used, the
motivations behind simplifications, and the set of program transformations
covered by existing refactoring types. Our study of 382 pull requests from 296
projects reveals that there exist gaps in applying existing approaches for
automating developer-induced program simplification. and outlines the criteria
for designing automatic program simplification techniques. Inspired by our
study and to reduce the manual effort in developer-induced program
simplification, we propose SimpT5, a tool that can automatically produce
simplified programs (semantically-equivalent programs with reduced source lines
of code). SimpT5 is trained based on our collected dataset of 92,485 simplified
programs with two heuristics: (1) simplified line localization that encodes
lines changed in simplified programs, and (2)checkers that measure the quality
of generated programs. Our evaluation shows that SimpT5 are more effective than
prior approaches in automating developer-induced program simplification
A Machine-learning Based Ensemble Method For Anti-patterns Detection
Anti-patterns are poor solutions to recurring design problems. Several
empirical studies have highlighted their negative impact on program
comprehension, maintainability, as well as fault-proneness. A variety of
detection approaches have been proposed to identify their occurrences in source
code. However, these approaches can identify only a subset of the occurrences
and report large numbers of false positives and misses. Furthermore, a low
agreement is generally observed among different approaches. Recent studies have
shown the potential of machine-learning models to improve this situation.
However, such algorithms require large sets of manually-produced training-data,
which often limits their application in practice. In this paper, we present
SMAD (SMart Aggregation of Anti-patterns Detectors), a machine-learning based
ensemble method to aggregate various anti-patterns detection approaches on the
basis of their internal detection rules. Thus, our method uses several
detection tools to produce an improved prediction from a reasonable number of
training examples. We implemented SMAD for the detection of two well known
anti-patterns: God Class and Feature Envy. With the results of our experiments
conducted on eight java projects, we show that: (1) our method clearly improves
the so aggregated tools; (2) SMAD significantly outperforms other ensemble
methods.Comment: Preprint Submitted to Journal of Systems and Software, Elsevie
- …