thesis

Learning to cope with small noisy data in software effort estimation

Abstract

Though investigated for decades, Software Effort Estimation (SEE) remains a challenging problem in software project management. However, there are several factors hindering the practical use of SEE models. One major factor is the scarcity of software projects that are used to construct SEE models due to the long process of software development. Even given a large number of projects, the collected effort values are usually corrupted by noise due to the participation of humans. Furthermore, even given enough and noise-free software projects, SEE models may have sensitive parameters to tune possibly causing model sensitivity problem. The thesis focuses on tackling these three issues. It proposes a synthetic data generator to tackle the data scarcity problem, introduces/constructs uncertain effort estimators to tackle the data noise problem, and analyses the sensitivity to parameter settings of popular SEE models. The main contributions of the thesis include: 1. Propose a synthetic project generator and provide an understanding of when and why it improves prediction performance of what baseline models. 2. Introduce relevance vector machine for uncertain effort estimation. 3. Propose a better uncertain estimation method based on an ensemble strategy. 4. Provide a better understanding of the impact of parameter tuning for SEE methods

    Similar works