Search CORE

160,704 research outputs found

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Author: Albrecht
Austin
Baird
Batista
Boehm
Boehm
Breiman
Briand
Briand
Briand
Brockmeier
Cartwright
Cheung
Clark
Feelders
Finnie
Gama
Gray
Holte
Jain
Jeffery
Jun Liu
Jönsson
Kemerer
Khotanzad
Kibler
Kim
Kitchenham
Kohavi
Little
Little
Little
Little
Little
Martin Shepperd
Miranda
Myrtveit
Pickard
Putnam
Qinbao Song
Quinlan
Robins
Rubin
Rubin
Rubin
Rubin
Samson
Selby
Shao
Shepperd
Shepperd
Siedelecki
Song
Song
Srinivasan
Strike
Tabachnick
Tay
Walkerden
Walston
Xiangru Chen
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

Crossref

Brunel University Research Archive

The consistency of empirical comparisons of regression and analogy-based software project cost prediction

Author: Mair CM
Shepperd M J
Publication venue: IEEE Computer Society
Publication date: 01/01/2005
Field of study

OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS – our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS – we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: “What is the best prediction system?

Brunel University Research Archive

Recommended from our members

A systematic review of software development cost estimation studies

Author: Jørgensen M
Shepperd MJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper aims to provide a basis for the improvement of software estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) Increase the breadth of the search for relevant studies, 2) Search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) Conduct more studies on estimation methods commonly used by the software industry, and, 4) Increase the awareness of how properties of the data sets impact the results when evaluating estimation methods

Brunel University Research Archive

Reliability and validity in comparative studies of software prediction models

Author: Myrtveit I
Shepperd MJ
Stensrud E
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2005
Field of study

Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models

Crossref

Brunel University Research Archive

Recommended from our members

Project Controls and Management Systems : current practice and how it has changed over the past decade

Author: Mostafa Kareem Tarek
Publication venue
Publication date: 17/08/2018
Field of study

Project Controls and Management System (PCMS) refers to an ecosystem of processes, tools and personnel required for the proper planning and execution of capital projects throughout the different phases of design, procurement, construction and startup. This can be divided into different focus areas (functions) that would include Estimating, Planning, Scheduling, Cost Control, Change Management, Progressing, and Forecasting. Various trends such as globalization, contractor specialization and information technology developments have impacted the way PCMS are implemented and made it the subject of extensive research over the past years to investigate how to best utilize those trends. Replicating the research methodology used in a 2011 report published by the Construction Research Institute (CII), this work aims to investigate the current status of PCMS implementation and how it has changed over the past decade. It was concluded that while the original PCMS principles are still valid, adoption has drastically changed in terms of efficiency for the majority of the functions. The research also identifies areas of potential concerns and provides recommendations for further improvement.Civil, Architectural, and Environmental Engineerin

Texas ScholarWorks

Modeling good research practices - overview: a report of the ISPOR-SMDM modeling good research practices task force - 1.

Author: Andrew H. Briggs
Bilcke
Borshchev
Brennan
Briggs
Briggs
Buxton
Caro
Caro
Eddy
Hunink
J. Jaime Caro
Jacobson
Karen M. Kuntz
Karnon
Kijowski
Law
Lorscheid
Macal
Petrou
Philips
Pitman
Price
Roberts
Siebert
Siebert
Stahl
Steele
Uwe Siebert
Weinstein
Weinstein
Publication venue: 'SAGE Publications'
Publication date: 01/01/2012
Field of study

Models—mathematical frameworks that facilitate estimation of the consequences of health care decisions—have become essential tools for health technology assessment. Evolution of the methods since the first ISPOR modeling task force reported in 2003 has led to a new task force, jointly convened with the Society for Medical Decision Making, and this series of seven papers presents the updated recommendations for best practices in conceptualizing models; implementing state–transition approaches, discrete event simulations, or dynamic transmission models; dealing with uncertainty; and validating and reporting models transparently. This overview introduces the work of the task force, provides all the recommendations, and discusses some quandaries that require further elucidation. The audience for these papers includes those who build models, stakeholders who utilize their results, and, indeed, anyone concerned with the use of models to support decision making

Elsevier - Publisher Connector

Crossref

Enlighten

Making Software Cost Data Available for Meta-Analysis

Author: Mair Carolyn
Shepperd Martin
Publication venue
Publication date: 01/01/2005
Field of study

In this paper we consider the increasing need for meta-analysis within empirical software engineering. However, we also note that a necessary precondition to such forms of analysis is to have both the results in an appropriate format and sufficient contextual information to avoid misleading inferences. We consider the implications in the field of software project effort estimation and show that for a sample of 12 seemingly similar published studies, the results are difficult to compare let alone combine. This is due to different reporting conventions. We argue that a protocol is required and make some suggestions as to what it should contain