4 research outputs found
Fuzzy C-mean missing data imputation for analogy-based effort estimation
The accuracy of effort estimation in one of the major factors in the success or failure of software projects. Analogy-Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project. The accuracy of prediction in ABE model in strongly associated with the quality of the dataset since it depends on previous completed projects for estimation. Missing Data (MD) is one of major challenges in software engineering datasets. Several missing data imputation techniques have been investigated by researchers in ABE model. Identification of the most similar donor values from the completed software projects dataset for imputation is a challenging issue in existing missing data techniques adopted for ABE model. In this study, Fuzzy C-Mean Imputation (FCMI), Mean Imputation (MI) and K-Nearest Neighbor Imputation (KNNI) are investigated to impute missing values in Desharnais dataset under different missing data percentages (Desh-Miss1, Desh-Miss2) for ABE model. FCMI-ABE technique is proposed in this study. Evaluation comparison among MI, KNNI, and (ABE-FCMI) is conducted for ABE model to identify the suitable MD imputation method. The results suggest that the use of (ABE-FCMI), rather than MI and KNNI, imputes more reliable values to incomplete software projects in the missing datasets. It was also found that the proposed imputation method significantly improves software development effort prediction of ABE model
Multifunctional optimized group method data handling for software effort estimation
Nowadays, the trend of significant effort estimations is in demand. Due to its popularity, the stakeholder needs effective and efficient software development processes with the best estimation and accuracy to suit all data types. Nevertheless, finding the best effort estimation model with good accuracy is hard to serve this purpose. Group Method of Data Handling (GMDH) algorithms have been widely used for modelling and identifying complex systems and potentially applied in software effort estimation. However, there is limited study to determine the best architecture and optimal weight coefficients of the transfer function for the GMDH model. This study aimed to propose a hybrid multifunctional GMDH with Artificial Bee Colony (GMDH-ABC) based on a combination of four individual GMDH models, namely, GMDH-Polynomial, GMDH-Sigmoid, GMDH-Radial Basis Function, and GMDH-Tangent. The best GMDH architecture is determined based on L9 Taguchi orthogonal array. Five datasets (i.e., Cocomo, Dershanais, Albrecht, Kemerer and ISBSG) were used to validate the proposed models. The missing values in the dataset are imputed by the developed MissForest Multiple imputation method (MFMI). The Mean Absolute Percentage Error (MAPE) was used as performance measurement. The result showed that the GMDH-ABC model outperformed the individual GMDH by more than 50% improvement compared to standard conventional GMDH models and the benchmark ANN model in all datasets. The Cocomo dataset improved by 49% compared to the conventional GMDH-LSM. Improvements of 71%, 63%, 67%, and 82% in accuracy were obtained for the Dershanis dataset, Albrecht dataset, Kemerer dataset, and ISBSG dataset, respectively, as compared with the conventional GMDH-LSM. The results indicated that the proposed GMDH-ABC model has the ability to achieve higher accuracy in software effort estimation
ΠΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° ΠΈΡΡ ΠΎΠ΄Π½ΡΡ Π΄Π°Π½Π½ΡΡ Π΄Π»Ρ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΠΊΡΠ΅Π΄ΠΈΡΠ½ΠΎΠ³ΠΎ ΡΠΊΠΎΡΠΈΠ½Π³Π°
ΠΠ±ΡΠ΅ΠΊΡΠΎΠΌ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅ ΡΠ΅ΡΡΠΎΠ²ΠΎΠΉ Π·Π°Π΄Π°ΡΠΈ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΡΡΡ Π΄Π°Π½Π½ΡΠ΅ ΠΎ ΠΊΡΠ΅Π΄ΠΈΡΠΎΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡΠΈ Π·Π°Π΅ΠΌΡΠΈΠΊΠΎΠ².
ΠΡΠ΅Π΄ΠΌΠ΅ΡΠΎΠΌ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΌΠ΅ΡΠΎΠ΄ΠΈΠΊΠ° ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π΄Π°Π½Π½ΡΡ
Π΄Π»Ρ ΠΊΡΠ΅Π΄ΠΈΡΠ½ΠΎΠ³ΠΎ ΡΠΊΠΎΡΠΈΠ½Π³Π°.
Π¦Π΅Π»Ρ Π΄Π°Π½Π½ΠΎΠΉ ΡΠ°Π±ΠΎΡΡ β ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠ° ΠΈ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΈΠΊΠΈ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π΄Π°Π½Π½ΡΡ
Π΄Π»Ρ ΠΊΡΠ΅Π΄ΠΈΡΠ½ΠΎΠ³ΠΎ ΡΠΊΠΎΡΠΈΠ½Π³Π°.The object of the study as a benchmark problem is borrowersβ creditworthiness data.
The subject of the study is data preparation methodology for credit scoring.
The goal of the present work is to develop and study data preparation methodology for credit scoring
Predictability of Missing Data Theory to Improve U.S. Estimatorβs Unreliable Data Problem
Since the topic of improving data quality has not been addressed for the U.S. defense cost estimating discipline beyond changes in public policy, the goal of the study was to close this gap and provide empirical evidence that supports expanding options to improve software cost estimation data matrices for U.S. defense cost estimators. The purpose of this quantitative study was to test and measure the level of predictive accuracy of missing data theory techniques that were referenced as traditional approaches in the literature, compare each theoriesβ results to a complete data matrix used in support of the U.S. defense cost estimation discipline, and determine which theories rendered incomplete and missing data sets in a single data matrix most reliable and complete under eight missing value percentages. A quantitative pre-experimental research design, a one group pretest-posttest no control group design, empirically tested and measured the predictive accuracy of traditional missing data theory techniques typically used in non-cost estimating disciplines. The results from the pre-experiments on a representative U.S. defense software cost estimation data matrix obtained, a nonproprietary set of historical software effort, size, and schedule numerical data used at Defense Acquisition University revealed that single and multiple imputation techniques were two viable options to improve data quality since calculations fell within 20% of the original data value 16.4% and 18.6%, respectively. This study supports positive social change by investigating how cost estimators, engineering economists, and engineering managers could improve the reliability of their estimate forecasts, provide better estimate predictions, and ultimately reduce taxpayer funds that are spent to fund defense acquisition cost overruns