Search CORE

69 research outputs found

An Empirical investigation into software effort estimation by analogy

Author: Schofield Christopher
Publication venue
Publication date
Field of study

Most practitioners recognise the important part accurate estimates of development effort play in the successful management of major software projects. However, it is widely recognised that current estimation techniques are often very inaccurate, while studies (Heemstra 1992; Lederer and Prasad 1993) have shown that effort estimation research is not being effectively transferred from the research domain into practical application. Traditionally, research has been almost exclusively focused on the advancement of algorithmic models (e.g. COCOMO (Boehm 1981) and SLIM (Putnam 1978)), where effort is commonly expressed as a function of system size. However, in recent years there has been a discernible movement away from algorithmic models with non-algorithmic systems (often encompassing machine learning facets) being actively researched. This is potentially a very exciting and important time in this field, with new approaches regularly being proposed. One such technique, estimation by analogy, is the focus of this thesis. The principle behind estimation by analogy is that past experience can often provide insights and solutions to present problems. Software projects are characterised in terms of collectable features (such as the number of screens or the size of the functional requirements) and stored in a historical case base as they are completed. Once a case base of sufficient size has been cultivated, new projects can be estimated by finding similar historical projects and re-using the recorded effort. To make estimation by analogy feasible it became necessary to construct a software tool, dubbed ANGEL, which allowed the collection of historical project data and the generation of estimates for new software projects. A substantial empirical validation of the approach was made encompassing approximately 250 real historical software projects across eight industrial data sets, using stepwise regression as a benchmark. Significance tests on the results accepted the hypothesis (at the 1% confidence level) that estimation by analogy is a superior prediction system to stepwise regression in terms of accuracy. A study was also made of the sensitivity of the analogy approach. By growing project data sets in a pseudo time-series fashion it was possible to answer pertinent questions about the approach, such as, what are the effects of outlying projects and what is the minimum data set size? The main conclusions of this work are that estimation by analogy is a viable estimation technique that would seem to offer some advantages over algorithmic approaches including, improved accuracy, easier use of categorical features and an ability to operate even where no statistical relationships can be found

Bournemouth University Research Online

Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Author: B Boehm
B Kitchenham
B Kitchenham
C Bishop
C Bishop
C Lokan
C Lokan
E Kocaguneli
E Kocaguneli
J Cohen
J Demšar
J Wen
JZ Kolter
L Minku
Leandro L. Minku
LL Minku
LL Minku
LL Minku
M Auer
M Hall
M Jørgensen
M Jørgensen
M Shepperd
M Shepperd
ML Mitchell
P Sentas
R Tibshirani
S Amasaki
S Muthukrishnan
T DeMarco
TM Gruschke
VS Cherkassky
Xin Yao
Y Kultur
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Investigation into Software Estimation Methods

Author: Hamdan Khaled
Publication venue
Publication date
Field of study

There are currently no fully validated estimation approaches that can accurately predict the effort needed for developing a software system (Kitchenham, et al, 1995). Information gathered at the early stages of system development is not enough to provide precise effort estimates, even though similar software systems may have been developed in the past. Where similar systems have been developed, there are often inherent differences in the features of these systems and in the development process used. These differences are often sufficient to significantly reduce estimation accuracy. Historically, cost estimation focuses on project effort and duration. There are many estimation techniques, but none is consistently ‘best’ (Shepperd, 2003). Software project management has become a crucial field of research due to the increasing role of software in today’s world. Improving the functions of project management is a main concern in software development organisation. The purpose of this thesis is to develop a new model which incorporates cultural and leadership factors in the cost estimation model, and is based on Case-Based Reasoning. The thesis defines a new knowledge representation “ontology” to provide a common understanding of project parameters. The associated system uses a statistically simulated bootstrap method, which helps in tuning the analogy approach before application to real projects. This research also introduces a new application of Profile Theory, which takes a formal approach to the measurement of leadership capabilities. A pilot study was performed in order to understand the approaches used for cost estimation in the Gulf region. Based on this initial study, a questionnaire was further refined and tested. Consequently, further surveys were conducted in the United Arab Emirates. It was noticed that most of the software development projects failed in terms of cost estimate. This was due to the lack of a precise software estimation model. These studies also highlighted the importance of leadership and culture in software cost estimation. Effort was estimated using regression and analogy. The Bootstrap method was used to refine the estimate of effort based on analogy, with correction for bias. Due to the very different nature of the core and support systems, a separate model was developed for each of them. As a result of the study, a new model for identifying and analysing was developed. The model was then evaluated, and conclusions were drawn. These show the importance of the model and the factors of organisational culture and leadership in software project development and in cost estimation. Potential areas for future research were identified

Sunderland University Institutional Repository

Schätzwerterfüllung in Softwareentwicklungsprojekten

Author: Biggeleben Matthias
Publication venue
Publication date: 15/06/2011
Field of study

Effort estimates are of utmost economic importance in software development projects. Estimates bridge the gap between managers and the invisible and almost artistic domain of developers. They give a means to managers to track and control projects. Consequently, numerous estimation approaches have been developed over the past decades, starting with Allan Albrecht's Function Point Analysis in the late 1970s. However, this work neither tries to develop just another estimation approach, nor focuses on improving accuracy of existing techniques. Instead of characterizing software development as a technological problem, this work understands software development as a sociological challenge. Consequently, this work focuses on the question, what happens when developers are confronted with estimates representing the major instrument of management control? Do estimates influence developers, or are they unaffected? Is it irrational to expect that developers start to communicate and discuss estimates, conform to them, work strategically, hide progress or delay? This study shows that it is inappropriate to assume an independency of estimated and actual development effort. A theory is developed and tested, that explains how developers and managers influence the relationship between estimated and actual development effort. The theory therefore elaborates the phenomenon of estimation fulfillment.Schätzwerte in Softwareentwicklungsprojekten sind von besonderer ökonomischer Wichtigkeit. Sie überbrücken die Lücke zwischen Projektleitern und der unsichtbaren und beinahe künstlerischen Domäne der Entwickler. Sie stellen ein Instrument dar, welches erlaubt, Projekte zu verfolgen und zu kontrollieren. Daher wurden in den vergangenen vier Jahrzehnten diverse Schätzverfahren entwickelt, beginnend mit der "Function Point" Analyse von Allan Albrecht. Diese Arbeit versucht allerdings weder ein neues Schätzverfahren zu entwickeln noch bestehende Verfahren zu verbessern. Anstatt Softwareentwicklung als technologisches Problem zu charakterisieren, wird in dieser Arbeit eine soziologische Perspektive genutzt. Dementsprechend fokussiert diese Arbeit die Frage, was passiert, wenn Entwickler mit Schätzwerten konfrontiert werden, die das wichtigste Kontrollinstrument des Managements darstellen? Lassen sich Entwickler von diesen Werten beeinflussen oder bleiben sie davon unberührt? Wäre es irrational, zu erwarten, dass Entwickler Schätzwerte kommunizieren, diese diskutieren, sich diesen anpassen, strategisch arbeiten sowie Verzögerungen verschleiern? Die vorliegende Studie zeigt, dass die Unabhängigkeitsannahme von Schätzwerten und tatsächlichem Entwicklungsaufwand unbegründet ist. Es wird eine Theorie entwickelt, welche erklärt, wie Entwickler und Projektleiter die Beziehung von Schätzungen und Aufwand beeinflussen und dass das Phänomen der Schätzwerterfüllung auftreten kann

Hochschulschriftenserver - Universität Frankfurt am Main

Evaluating subset selection methods for use case points estimation

Author: Prokopová Zdenka
Šilhavý Petr
Šilhavý Radek
Publication venue: Elsevier Science BV
Publication date: 23/04/2018
Field of study

When the Use Case Points method is used for software effort estimation, users are faced with low model accuracy which impacts on its practical application. This study investigates the significance of using subset selection methods for the prediction accuracy of Multiple Linear Regression models, obtained by the stepwise approach. K-means, Spectral Clustering, the Gaussian Mixture Model and Moving Window are evaluated as appropriate subset selection techniques. The methods were evaluated according to several evaluation criteria and then statistically tested. Evaluation was performing on two independent datasets-which differ in project types and size. Both were cut by the hold-out method. If clustering were used, the training sets were clustered into 3 classes; and, for each of class, an independent regression model was created. These were later used for the prediction of testing sets. If Moving Window was used, then window of sizes 5, 10 and 15 were tested. The results show that clustering techniques decrease prediction errors significantly when compared to Use Case Points or moving windows methods. Spectral Clustering was selected as the best-performing solution, because it achieves a Sum of Squared Errors reduction of 32% for the first dataset, and 98% for the second dataset. The Mean Absolute Percentage Error is less than 1% for the second dataset for Spectral Clustering; 9% for moving window; and 27% for Use Case Points. When the first dataset is used, then prediction errors are significantly higher -53% for Spectral Clustering, but Use Case Points produces a 165% result. It can be concluded that this study proves subset selection techniques as a significant method for improving the prediction ability of linear regression models - which are used for software development effort prediction. It can also be concluded that the clustering method performs better than the moving window method

Institutional repository of Tomas Bata University Library

A Principled Methodology: A Dozen Principles of Software Effort Estimation

Author: Kocaguneli Ekrem
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2012
Field of study

Software effort estimation (SEE) is the activity of estimating the total effort required to complete a software project. Correctly estimating the effort required for a software project is of vital importance for the competitiveness of the organizations. Both under- and over-estimation leads to undesirable consequences for the organizations. Under-estimation may result in overruns in budget and schedule, which in return may cause the cancellation of projects; thereby, wasting the entire effort spent until that point. Over-estimation may cause promising projects not to be funded; hence, harming the organizational competitiveness.;Due to the significant role of SEE for software organizations, there is a considerable research effort invested in SEE. Thanks to the accumulation of decades of prior research, today we are able to identify the core issues and search for the right principles to tackle pressing questions. For example, regardless of decades of work, we still lack concrete answers to important questions such as: What is the best SEE method? The introduced estimation methods make use of local data, however not all the companies have their own data, so: How can we handle the lack of local data? Common SEE methods take size attributes for granted, yet size attributes are costly and the practitioners place very little trust in them. Hence, we ask: How can we avoid the use of size attributes? Collection of data, particularly dependent variable information (i.e. effort values) is costly: How can find an essential subset of the SEE data sets? Finally, studies make use of sampling methods to justify a new method\u27s performance on SEE data sets. Yet, trade-off among different variants is ignored: How should we choose sampling methods for SEE experiments? ;This thesis is a rigorous investigation towards identification and tackling of the pressing issues in SEE. Our findings rely on extensive experimentation performed with a large corpus of estimation techniques on a large set of public and proprietary data sets. We summarize our findings and industrial experience in the form of 12 principles: 1) Know your domain 2) Let the Experts Talk 3) Suspect your data 4) Data Collection is Cyclic 5) Use a Ranking Stability Indicator 6) Assemble Superior Methods 7) Weighting Analogies is Over-elaboration 8) Use Easy-path Design 9) Use Relevancy Filtering 10) Use Outlier Pruning 11) Combine Outlier and Synonym Pruning 12) Be Aware of Sampling Method Trade-off

The Research Repository @ WVU (West Virginia University)

Calibration and Validation of the COCOMO II.1997.0 Cost/Schedule Estimating Model to the Space and Missile Systems Center Database

Author: Bernheisel Wayne A.
Publication venue: AFIT Scholar
Publication date: 01/09/1997
Field of study

The goal of this study was to determine the accuracy of COCOMO II.1997.0, a software cost and schedule estimating model, using Magnitude of Relative Error, Mean Magnitude of Relative Error, Relative Root Mean Square, and a 25 percent Prediction Level. Effort estimates were completed using the model in default and in calibrated mode. Calibration was accomplished by dividing four stratified data sets into two random validation and calibration data sets using five times resampling. The accuracy results were poor; the best having an accuracy of only .3332 within 40 percent of the time in calibrated mode. It was found that homogeneous data is the key to producing the best results, and the model typically underestimates. The second part of this thesis was to try and improve upon the default mode estimates. This was accomplished by regressing the model estimates to the actual effort. Each original regression equation was transformed and tested for normality, equal variance, and significance. Overall, the results were promising; regression improved the accuracy in three of the four cases, the best having an accuracy of .2059 within 75 percent of the time

AFTI Scholar (Air Force Institute of Technology)

An investigation into software estimation methods

Author: Hamdan Khaled
Publication venue
Publication date: 01/01/2009
Field of study

OpenGrey Repository