227 research outputs found
Reliability and validity in comparative studies of software prediction models
Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models
Using Bad Learners to find Good Configurations
Finding the optimally performing configuration of a software system for a
given setting is often challenging. Recent approaches address this challenge by
learning performance models based on a sample set of configurations. However,
building an accurate performance model can be very expensive (and is often
infeasible in practice). The central insight of this paper is that exact
performance values (e.g. the response time of a software system) are not
required to rank configurations and to identify the optimal one. As shown by
our experiments, models that are cheap to learn but inaccurate (with respect to
the difference between actual and predicted performance) can still be used rank
configurations and hence find the optimal configuration. This novel
\emph{rank-based approach} allows us to significantly reduce the cost (in terms
of number of measurements of sample configuration) as well as the time required
to build models. We evaluate our approach with 21 scenarios based on 9 software
systems and demonstrate that our approach is beneficial in 16 scenarios; for
the remaining 5 scenarios, an accurate model can be built by using very few
samples anyway, without the need for a rank-based approach.Comment: 11 pages, 11 figure
Comparing software prediction techniques using simulation
The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system
Recommended from our members
New ideas and emerging research: evaluating prediction system accuracy
BACKGROUND: Prediction e.g. of project cost is an important concern in software engineering. PROBLEM: Although many empirical validations of software engineering prediction systems have been published, no one approach dominates and sense-making of conflicting empirical results is proving challenging. METHOD: We propose a new approach to evaluating competing prediction systems based upon an unbiased statistic (Standardised Accuracy), analysis of results relative to the baseline technique of guessing and calculation of effect sizes. RESULTS: Two empirical studies are revisited and the published results are shown to be misleading when re-analysed using our new approach. CONCLUSION: Biased statistics such as MMRE are deprecated. By contrast our approach leads to valid results. Such steps will greatly assist in performing future meta-analyses
- âŠ