thesis

Impact analysis of a multiple imputation technique for handling missing value in the ISBSG repository of software projects

Abstract

Up until the early 2000’s, most of the empirical studies on the performance of estimation models for software projects have been carried out with fairly small samples (less than 20 projects) while only a few were based on larger samples (between 60 to 90 projects). With the set-up of the repository of software projects by the International Software Benchmarking Standards Group – ISBSG – there exists now a much larger data repository available for productivity analysis and for building estimation models: the 2013 release 12 of this ISBSG repository contains over 6,000 projects, thereby providing a sounder basis for statistical studies. However, there is in the ISBSG repository a large number of missing values for a significant number of variables, making its uses rather challenging for research purposes. This research aims to build a basis to improve the investigation of the ISBSG repository of software projects, in order to develop estimation models using different combinations of parameters for which there are distinct sub-samples without missing values. The goal of this research is to tackle the new problems in larger datasets in software engineering including missing values and outliers using the multiple imputation technique

    Similar works