26 research outputs found

    The consistency of empirical comparisons of regression and analogy-based software project cost prediction

    Get PDF
    OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS – our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS – we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: “What is the best prediction system?

    Comparing software prediction techniques using simulation

    Get PDF
    The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system

    Integrate the GM(1,1) and Verhulst models to predict software stage effort

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Software effort prediction clearly plays a crucial role in software project management. In keeping with more dynamic approaches to software development, it is not sufficient to only predict the whole-project effort at an early stage. Rather, the project manager must also dynamically predict the effort of different stages or activities during the software development process. This can assist the project manager to reestimate effort and adjust the project plan, thus avoiding effort or schedule overruns. This paper presents a method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst. This method establishes models dynamically according to particular types of stage-effort sequences, and can adapt to particular development methodologies automatically by using a novel grey feedback mechanism. We evaluate the proposed method with a large-scale real-world software engineering dataset, and compare it with the linear regression method and the Kalman filter method, revealing that accuracy has been improved by at least 28% and 50%, respectively. The results indicate that the method can be effective and has considerable potential. We believe that stage predictions could be a useful complement to whole-project effort prediction methods.National Natural Science Foundation of China and the Hi-Tech Research and Development Program of Chin

    Reliability and validity in comparative studies of software prediction models

    Get PDF
    Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models

    An Empirical Evaluation of Effort Prediction Models Based on Functional Size Measures

    Get PDF
    Software development effort estimation is among the most interesting issues for project managers, since reliable estimates are at the base of good planning and project control. Several different techniques have been proposed for effort estimation, and practitioners need evidence, based on which they can choose accurate estimation methods. The work reported here aims at evaluating the accuracy of software development effort estimates that can be obtained via popular techniques, such as those using regression models and those based on analogy. The functional size and the development effort of twenty software development projects were measured, and the resulting dataset was used to derive effort estimation models and evaluate their accuracy. Our data analysis shows that estimation based on the closest analogues provides better results for most models, but very bad estimates in a few cases. To mitigate this behavior, the correction of regression toward the mean proved effective. According to the results of our analysis, it is advisable that regression to the mean correction is used when the estimates are based on closest analogues. Once corrected, the accuracy of analogy-based estimation is not substantially different from the accuracy of regression based models

    Adopting the Appropriate Performance Measures for Soft Computing-based Estimation by Analogy

    Get PDF
    Soft Computing based estimation by analogy is a lucrative research domain for the software engineering research community. There are a considerable number of models proposed in this research area. Therefore, researchers are of interest to compare the models to identify the best one for software development effort estimation. This research showed that most of the studies used mean magnitude of relative error (MMRE) and percentage of prediction (PRED) for the comparison of their estimation models. Still, it was also found in this study that there are quite a number of criticisms done on accuracy statistics like MMRE and PRED by renowned authors. It was found that MMRE is an unbalanced, biased, and inappropriate performance measure for identifying the best among competing estimation models. The accuracy statistics, e.g., MMRE and PRED, are still adopted in the evaluation criteria by the domain researchers, stating the reason for “widely used,” which is not a valid reason. This research study identified that, since there is no practical solution provided so far, which could replace MMRE and PRED, the researchers are adopting these measures. The approach of partitioning the large dataset into subsamples was tried in this paper using estimation by analogy (EBA) model. One small and one large dataset were considered for it, such as Desharnais and ISBSG release 11. The ISBSG dataset is a large dataset concerning Desharnais. The ISBSG dataset was partitioned into subsamples. The results suggested that when the large datasets are partitioned, the MMRE produces the same or nearly the same results, which it produces for the small dataset. It is observed that the MMRE can be trusted as a performance metric if the large datasets are partitioned into subsamples

    Cost estimation in agile development projects

    Get PDF
    One of the key measures of the resilience of a project is its ability to reach completion on time and on budget, regardless of the turbulent and uncertain environment it may operate within. Cost estimation and tracking are therefore paramount when developing a system. Cost estimation has long been a difficult task in systems development, and although much research has focused on traditional methods, little is known about estimation in the agile method arena. This is ironic given that the reduction of cost and development time is the driving force behind the emergence of the agile method paradigm. This study investigates the applicability of current estimation techniques to more agile development approaches by focusing on four case studies of agile method use across different organisations. The study revealed that estimation inaccuracy was a less frequent occurrence for these companies. The frequency with which estimates are required on agile projects, typically at the beginning of each iteration, meant that the companies found estimation easier than when traditional approaches were used. The main estimation techniques used were expert knowledge and analogy to past projects. A number of recommendations can be drawn from the research: estimation models are not a necessary component of the process; fixed price budgets can prove beneficial for both developers and customers; and experience and past project data should be documented and used to aid the estimation of subsequent projects
    corecore