160,704 research outputs found
Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation
Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%
The consistency of empirical comparisons of regression and analogy-based software project cost prediction
OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD â we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS â our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS â we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: âWhat is the best prediction system?
Recommended from our members
A systematic review of software development cost estimation studies
This paper aims to provide a basis for the improvement of software estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) Increase the breadth of the search for relevant studies, 2) Search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) Conduct more studies on estimation methods commonly used by the software industry, and, 4) Increase the awareness of how properties of the data sets impact the results when evaluating estimation methods
Reliability and validity in comparative studies of software prediction models
Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models
Recommended from our members
Project Controls and Management Systems : current practice and how it has changed over the past decade
Project Controls and Management System (PCMS) refers to an ecosystem of processes, tools and personnel required for the proper planning and execution of capital projects throughout the different phases of design, procurement, construction and startup. This can be divided into different focus areas (functions) that would include Estimating, Planning, Scheduling, Cost Control, Change Management, Progressing, and Forecasting. Various trends such as globalization, contractor specialization and information technology developments have impacted the way PCMS are implemented and made it the subject of extensive research over the past years to investigate how to best utilize those trends. Replicating the research methodology used in a 2011 report published by the Construction Research Institute (CII), this work aims to investigate the current status of PCMS implementation and how it has changed over the past decade. It was concluded that while the original PCMS principles are still valid, adoption has drastically changed in terms of efficiency for the majority of the functions. The research also identifies areas of potential concerns and provides recommendations for further improvement.Civil, Architectural, and Environmental Engineerin
Modeling good research practices - overview: a report of the ISPOR-SMDM modeling good research practices task force - 1.
Modelsâmathematical frameworks that facilitate estimation of the consequences of health care decisionsâhave become essential tools for health technology assessment. Evolution of the methods since the first ISPOR modeling task force reported in 2003 has led to a new task force, jointly convened with the Society for Medical Decision Making, and this series of seven papers presents the updated recommendations for best practices in conceptualizing models; implementing stateâtransition approaches, discrete event simulations, or dynamic transmission models; dealing with uncertainty; and validating and reporting models transparently. This overview introduces the work of the task force, provides all the recommendations, and discusses some quandaries that require further elucidation. The audience for these papers includes those who build models, stakeholders who utilize their results, and, indeed, anyone concerned with the use of models to support decision making
Making Software Cost Data Available for Meta-Analysis
In this paper we consider the increasing need for meta-analysis within empirical software engineering. However, we also note that a necessary precondition to such forms of analysis is to have both the results in an appropriate format and sufficient contextual information to avoid misleading inferences. We consider the implications in the field of software project effort estimation and show that for a sample of 12 seemingly similar published studies, the results are difficult to compare let alone combine. This is due to different reporting conventions. We argue that a protocol is required and make some suggestions as to what it should contain
- âŚ