Search CORE

5 research outputs found

Fuzzy C-mean missing data imputation for analogy-based effort estimation

Author: Abang Jawawi Dayang Norhayati
Al Mutlaq Ayman Jalal
Arbain Adila Firdaus
Publication venue: 'The Science and Information Organization'
Publication date: 01/08/2021
Field of study

The accuracy of effort estimation in one of the major factors in the success or failure of software projects. Analogy-Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project. The accuracy of prediction in ABE model in strongly associated with the quality of the dataset since it depends on previous completed projects for estimation. Missing Data (MD) is one of major challenges in software engineering datasets. Several missing data imputation techniques have been investigated by researchers in ABE model. Identification of the most similar donor values from the completed software projects dataset for imputation is a challenging issue in existing missing data techniques adopted for ABE model. In this study, Fuzzy C-Mean Imputation (FCMI), Mean Imputation (MI) and K-Nearest Neighbor Imputation (KNNI) are investigated to impute missing values in Desharnais dataset under different missing data percentages (Desh-Miss1, Desh-Miss2) for ABE model. FCMI-ABE technique is proposed in this study. Evaluation comparison among MI, KNNI, and (ABE-FCMI) is conducted for ABE model to identify the suitable MD imputation method. The results suggest that the use of (ABE-FCMI), rather than MI and KNNI, imputes more reliable values to incomplete software projects in the missing datasets. It was also found that the proposed imputation method significantly improves software development effort prediction of ABE model

Universiti Teknologi Malaysia Institutional Repository

Missing data techniques in analogy-based software development effort estimation

Author: Abdi
Alain Abran
Albrecht
AlFalahi
Ali Idri
Amazal
Amazal
Amazal
Azzeh
Azzeh
Azzeh
Bezdek
Boehm
Cartwright
Cohen
Demšar
El Emam
El Emam
Herrera
Ibtissam Abnane
Idri
Idri
Idri
Idri
Idri
Idri
Idri
Idri
Idri
Jorgensen
Keung
Kocaguneli
Kocaguneli
Li
Li
Li
Little
Little
Liu
Maxwell
Milios
Minku
Miyazaki
Myrtveit
Mühlenbein
Sentas
Shepperd
Shepperd
Sheskin
Sicilia
Song
Strike
Twala
Twala
Wen
Wen
Xie
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Explanatory and Causality Analysis in Software Engineering

Author: Alshehri Yasser Ali
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2018
Field of study

Software fault proneness and software development efforts are two key areas of software engineering. Improving them will significantly reduce the cost and promote good planning and practice in developing and managing software projects. Traditionally, studies of software fault proneness and software development efforts were focused on analysis and prediction, which can help to answer questions like `when’ and `where’. The focus of this dissertation is on explanatory and causality studies that address questions like `why’ and `how’. First, we applied a case-control study to explain software fault proneness. We found that Bugfixes (Prerelease bugs), Developers, Code Churn, and Age of a file are the main contributors to the Postrelease bugs in some of the open-source projects. In terms of the interactions, we found that Bugfixes and Developers reduced the risk of post release software faults. The explanatory models were tested for prediction and their performance was either comparable or better than the top-performing classifiers used in related studies. Our results indicate that software project practitioners should pay more attention to the prerelease bug fixing process and the number of Developers assigned, as well as their interaction. Also, they need to pay more attention to the new files (less than one year old) which contributed significantly more to Postrelease bugs more than old files. Second, we built a model that explains and predicts multiple levels of software development effort and measured the effects of several metrics and their interactions using categorical regression models. The final models for the three data sets used were statistically fit, and performance was comparable to related studies. We found that project size, duration, the existence of any type of faults, the use of first- or second generation of programming languages, and team size significantly increased the software development effort. On the other side, the interactions between duration and defective project, and between duration and team size reduced the software development effort. These results suggest that software practitioners should pay extra attention to the time of the project and the team size assigned for every task because when they increased from a low to a higher level, they significantly increased the software development effort. Third, a structural equation modeling method was applied for causality analysis of software fault proneness. The method combined statistical and regression analysis to find the direct and indirect causes for software faults using partial least square path modeling method. We found direct and indirect paths from measurement models that led to software postrelease bugs. Specifically, the highest direct effect came from the change request, while changing the code had a minor impact on software faults. The highest impact of the code change resulted from the change requests (either for bug fixing or refactoring). Interestingly, the indirect impact from code characteristics to software fault proneness was higher than the direct impact. We found a similar level of direct and indirect impact from code characteristics to code change

The Research Repository @ WVU (West Virginia University)

Predictability of Missing Data Theory to Improve U.S. Estimator’s Unreliable Data Problem

Author: Williams Tomeka S.
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2021
Field of study

Since the topic of improving data quality has not been addressed for the U.S. defense cost estimating discipline beyond changes in public policy, the goal of the study was to close this gap and provide empirical evidence that supports expanding options to improve software cost estimation data matrices for U.S. defense cost estimators. The purpose of this quantitative study was to test and measure the level of predictive accuracy of missing data theory techniques that were referenced as traditional approaches in the literature, compare each theories’ results to a complete data matrix used in support of the U.S. defense cost estimation discipline, and determine which theories rendered incomplete and missing data sets in a single data matrix most reliable and complete under eight missing value percentages. A quantitative pre-experimental research design, a one group pretest-posttest no control group design, empirically tested and measured the predictive accuracy of traditional missing data theory techniques typically used in non-cost estimating disciplines. The results from the pre-experiments on a representative U.S. defense software cost estimation data matrix obtained, a nonproprietary set of historical software effort, size, and schedule numerical data used at Defense Acquisition University revealed that single and multiple imputation techniques were two viable options to improve data quality since calculations fell within 20% of the original data value 16.4% and 18.6%, respectively. This study supports positive social change by investigating how cost estimators, engineering economists, and engineering managers could improve the reliability of their estimate forecasts, provide better estimate predictions, and ultimately reduce taxpayer funds that are spent to fund defense acquisition cost overruns

Walden University