Search CORE

120,304 research outputs found

Reliability and validity in comparative studies of software prediction models

Author: Myrtveit I
Shepperd MJ
Stensrud E
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2005
Field of study

Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models

Crossref

Brunel University Research Archive

Making Software Cost Data Available for Meta-Analysis

Author: Mair Carolyn
Shepperd Martin
Publication venue
Publication date: 01/01/2005
Field of study

In this paper we consider the increasing need for meta-analysis within empirical software engineering. However, we also note that a necessary precondition to such forms of analysis is to have both the results in an appropriate format and sufficient contextual information to avoid misleading inferences. We consider the implications in the field of software project effort estimation and show that for a sample of 12 seemingly similar published studies, the results are difficult to compare let alone combine. This is due to different reporting conventions. We argue that a protocol is required and make some suggestions as to what it should contain

Recommended from our members

Predicting with sparse data

Author: Cartwright MH
Shepperd MJ
Publication venue: IEEE Computer Society
Publication date: 01/01/2001
Field of study

It is well known that effective prediction of project cost related factors is an important aspect of software engineering. Unfortunately, despite extensive research over more than 30 years, this remains a significant problem for many practitioners. A major obstacle is the absence of reliable and systematic historic data, yet this is a sine qua non for almost all proposed methods: statistical, machine learning or calibration of existing models. In this paper we describe our sparse data method (SDM) based upon a pairwise comparison technique and Saaty's Analytic Hierarchy Process (AHP). Our minimum data requirement is a single known point. The technique is supported by a software tool known as DataSalvage. We show, for data from two companies, how our approach — based upon expert judgement — adds value to expert judgement by producing significantly more accurate and less biased results. A sensitivity analysis shows that our approach is robust to pairwise comparison errors. We then describe the results of a small usability trial with a practising project manager. From this empirical work we conclude that the technique is promising and may help overcome some of the present barriers to effective project prediction

Brunel University Research Archive

What accuracy statistics really measure

Author: B.A. Kitchenham
JORGENSEN
L.M. Pickard
LO
M.J. Shepperd
MIYAZAKI
MYRTVEIT
PICKARD
S.G. MacDonell
STENSRUD
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2001
Field of study

Provides the software estimation research community with a better understanding of the meaning of, and relationship between, two statistics that are often used to assess the accuracy of predictive models: the mean magnitude relative error (MMRE) and the number of predictions within 25% of the actual, pred(25). It is demonstrated that MMRE and pred(25) are, respectively, measures of the spread and the kurtosis of the variable z, where z=estimate/actual. Thus, z is considered to be a measure of accuracy, and statistics such as MMRE and pred(25) to be measures of properties of the distribution of z. It is suggested that measures of the central location and skewness of z, as well as measures of spread and kurtosis, are necessary. Furthermore, since the distribution of z is non-normal, non-parametric measures of these properties may be needed. For this reason, box-plots of z are useful alternatives to simple summary metrics. It is also noted that the simple residuals are better behaved than the z variable, and could also be used as the basis for comparing prediction system

Crossref

AUT Scholarly Commons

Brunel University Research Archive

Making inferences with small numbers of training sets

Author: Albrecht
Boehm
C. Kirsopp
Ebert
Efron
Kadoda
Kemerer
Kitchenham
Kitchenham
Kok
M. Shepperd
MacDonell
Mair
Miyazaki
Shepperd
Shepperd
Srinivasan
Walston
Wittig
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2002
Field of study

A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency

CiteSeerX

Crossref

Brunel University Research Archive