Search CORE

4 research outputs found

Fuzzy C-mean missing data imputation for analogy-based effort estimation

Author: Abang Jawawi Dayang Norhayati
Al Mutlaq Ayman Jalal
Arbain Adila Firdaus
Publication venue: 'The Science and Information Organization'
Publication date: 01/08/2021
Field of study

The accuracy of effort estimation in one of the major factors in the success or failure of software projects. Analogy-Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project. The accuracy of prediction in ABE model in strongly associated with the quality of the dataset since it depends on previous completed projects for estimation. Missing Data (MD) is one of major challenges in software engineering datasets. Several missing data imputation techniques have been investigated by researchers in ABE model. Identification of the most similar donor values from the completed software projects dataset for imputation is a challenging issue in existing missing data techniques adopted for ABE model. In this study, Fuzzy C-Mean Imputation (FCMI), Mean Imputation (MI) and K-Nearest Neighbor Imputation (KNNI) are investigated to impute missing values in Desharnais dataset under different missing data percentages (Desh-Miss1, Desh-Miss2) for ABE model. FCMI-ABE technique is proposed in this study. Evaluation comparison among MI, KNNI, and (ABE-FCMI) is conducted for ABE model to identify the suitable MD imputation method. The results suggest that the use of (ABE-FCMI), rather than MI and KNNI, imputes more reliable values to incomplete software projects in the missing datasets. It was also found that the proposed imputation method significantly improves software development effort prediction of ABE model

Universiti Teknologi Malaysia Institutional Repository

Multifunctional optimized group method data handling for software effort estimation

Author: Arbain Siti Hajar
Publication venue
Publication date: 01/01/2022
Field of study

Nowadays, the trend of significant effort estimations is in demand. Due to its popularity, the stakeholder needs effective and efficient software development processes with the best estimation and accuracy to suit all data types. Nevertheless, finding the best effort estimation model with good accuracy is hard to serve this purpose. Group Method of Data Handling (GMDH) algorithms have been widely used for modelling and identifying complex systems and potentially applied in software effort estimation. However, there is limited study to determine the best architecture and optimal weight coefficients of the transfer function for the GMDH model. This study aimed to propose a hybrid multifunctional GMDH with Artificial Bee Colony (GMDH-ABC) based on a combination of four individual GMDH models, namely, GMDH-Polynomial, GMDH-Sigmoid, GMDH-Radial Basis Function, and GMDH-Tangent. The best GMDH architecture is determined based on L9 Taguchi orthogonal array. Five datasets (i.e., Cocomo, Dershanais, Albrecht, Kemerer and ISBSG) were used to validate the proposed models. The missing values in the dataset are imputed by the developed MissForest Multiple imputation method (MFMI). The Mean Absolute Percentage Error (MAPE) was used as performance measurement. The result showed that the GMDH-ABC model outperformed the individual GMDH by more than 50% improvement compared to standard conventional GMDH models and the benchmark ANN model in all datasets. The Cocomo dataset improved by 49% compared to the conventional GMDH-LSM. Improvements of 71%, 63%, 67%, and 82% in accuracy were obtained for the Dershanis dataset, Albrecht dataset, Kemerer dataset, and ISBSG dataset, respectively, as compared with the conventional GMDH-LSM. The results indicated that the proposed GMDH-ABC model has the ability to achieve higher accuracy in software effort estimation

Universiti Teknologi Malaysia Institutional Repository

Подготовка исходных данных для построения кредитного скоринга

Author: Инхиреева Татьяна Александровна
Publication venue
Publication date: 01/01/2019
Field of study

Объектом исследования в качестве тестовой задачи рассматривается данные о кредитоспособности заемщиков. Предметом исследования является методика обработки данных для кредитного скоринга. Цель данной работы – разработка и исследование методики обработки данных для кредитного скоринга.The object of the study as a benchmark problem is borrowers’ creditworthiness data. The subject of the study is data preparation methodology for credit scoring. The goal of the present work is to develop and study data preparation methodology for credit scoring

Electronic archive of Tomsk Polytechnic University

Predictability of Missing Data Theory to Improve U.S. Estimator’s Unreliable Data Problem

Author: Williams Tomeka S.
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2021
Field of study

Since the topic of improving data quality has not been addressed for the U.S. defense cost estimating discipline beyond changes in public policy, the goal of the study was to close this gap and provide empirical evidence that supports expanding options to improve software cost estimation data matrices for U.S. defense cost estimators. The purpose of this quantitative study was to test and measure the level of predictive accuracy of missing data theory techniques that were referenced as traditional approaches in the literature, compare each theories’ results to a complete data matrix used in support of the U.S. defense cost estimation discipline, and determine which theories rendered incomplete and missing data sets in a single data matrix most reliable and complete under eight missing value percentages. A quantitative pre-experimental research design, a one group pretest-posttest no control group design, empirically tested and measured the predictive accuracy of traditional missing data theory techniques typically used in non-cost estimating disciplines. The results from the pre-experiments on a representative U.S. defense software cost estimation data matrix obtained, a nonproprietary set of historical software effort, size, and schedule numerical data used at Defense Acquisition University revealed that single and multiple imputation techniques were two viable options to improve data quality since calculations fell within 20% of the original data value 16.4% and 18.6%, respectively. This study supports positive social change by investigating how cost estimators, engineering economists, and engineering managers could improve the reliability of their estimate forecasts, provide better estimate predictions, and ultimately reduce taxpayer funds that are spent to fund defense acquisition cost overruns

Walden University