13 research outputs found
Making inferences with small numbers of training sets
A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency
Recommended from our members
Estimating software project effort using analogies
Accurate project effort prediction is an important goal for the software engineering community. To date most work has focused upon building algorithmic models of effort, for example COCOMO. These can be calibrated to local environments. We describe an alternative approach to estimation based upon the use of analogies. The underlying principle is to characterise projects in terms of features (for example, the number of interfaces, the development method or the size of the functional requirements document). Completed projects are stored and then the problem becomes one of finding the most similar projects to the one for which a prediction is required. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project features. Each dimension is standardised so all dimensions have equal weight. The known effort values of the nearest neighbours to the new project are then used as the basis for the prediction. The process is automated using a PC based tool known as ANGEL. The method is validated on nine different industrial datasets (a total of 275 projects) and in all cases analogy outperforms algorithmic models based upon stepwise regression. From this work we argue that estimation by analogy is a viable technique that, at the very least, can be used by project managers to complement current estimation techniques
Quantum Software Analytics: Opportunities and Challenges
Quantum computing systems depend on the principles of quantum mechanics to
perform multiple challenging tasks more efficiently than their classical
counterparts. In classical software engineering, the software life cycle is
used to document and structure the processes of design, implementation, and
maintenance of software applications. It helps stakeholders understand how to
build an application. In this paper, we summarize a set of software analytics
topics and techniques in the development life cycle that can be leveraged and
integrated into quantum software application development. The results of this
work can assist researchers and practitioners in better understanding the
quantum-specific emerging development activities, challenges, and opportunities
in the next generation of quantum software
Software maintenance cost estimation with fourth generation languages
This thesis addresses the problem of allocation of software maintenance resources in a commercial environment using fourth generation language systems. The activity of maintaining software has a poor image amongst software managers, as it often appears that there is no end product. This image will only improve when software maintenance can be discussed in business terms, one of the main reasons being that the maintenance costs can then be compared to the costs of not maintaining the system. Software maintenance will continue to exist in the fourth generation environment, as systems will still be required to evolve. Cost estimation is an imprecise science, as there are many variables such as human, technical, environmental and political which can effect the ultimate costs of software and the resources required to maintain it. Some of the factors appear more obvious than others, for example an experienced programmer can achieve a specific task in less time than an inexperienced one. To fully estimate software maintenance costs these factors need to be identified and weights assigned to them. This thesis examines a means to identify these factors and their weights, and produces the first cut of an equation which will enable the software maintenance resources in a fourth generation language to be estimated
An Empirical investigation into software effort estimation by analogy
Most practitioners recognise the important part accurate estimates of development effort play in the successful management of major software projects. However, it is widely recognised that current estimation techniques are often very inaccurate, while studies (Heemstra 1992;
Lederer and Prasad 1993) have shown that effort estimation research is not being effectively transferred from the research domain into practical application. Traditionally, research has been almost exclusively focused on the advancement of algorithmic models (e.g. COCOMO (Boehm 1981) and SLIM (Putnam 1978)), where effort is commonly expressed as a function of system size. However, in recent years there has been a discernible movement away from algorithmic models with non-algorithmic systems (often encompassing machine learning facets) being actively researched. This is potentially a very exciting and important time in this
field, with new approaches regularly being proposed. One such technique, estimation by analogy, is the focus of this thesis. The principle behind estimation by analogy is that past experience can often provide insights and solutions to present problems. Software projects are characterised in terms of collectable features (such as the number of screens or the size of the functional requirements) and stored in a historical case base as they are completed. Once a case base of sufficient size has been cultivated, new projects can be estimated by finding similar historical projects and re-using the recorded effort. To make estimation by analogy feasible it became necessary to construct a software tool, dubbed ANGEL, which allowed the collection of historical project data and the generation of
estimates for new software projects. A substantial empirical validation of the approach was made encompassing approximately 250 real historical software projects across eight industrial data sets, using stepwise regression as a benchmark. Significance tests on the results accepted
the hypothesis (at the 1% confidence level) that estimation by analogy is a superior prediction system to stepwise regression in terms of accuracy. A study was also made of the sensitivity of the analogy approach. By growing project data sets in a pseudo time-series fashion it was possible to answer pertinent questions about the approach, such as, what are the effects of outlying projects and what is the minimum data set size? The main conclusions of this work are that estimation by analogy is a viable estimation
technique that would seem to offer some advantages over algorithmic approaches including, improved accuracy, easier use of categorical features and an ability to operate even where no statistical relationships can be found
Object-oriented software development effort prediction using design patterns from object interaction analysis
Software project management is arguably the most important activity in modern
software development projects. In the absence of realistic and objective management, the
software development process cannot be managed in an effective way. Software
development effort estimation is one of the most challenging and researched problems in
project management. With the advent of object-oriented development, there have been
studies to transpose some of the existing effort estimation methodologies to the new
development paradigm. However, there is not in existence a holistic approach to
estimation that allows for the refinement of an initial estimate produced in the
requirements gathering phase through to the design phase. A SysML point methodology
is proposed that is based on a common, structured and comprehensive modeling
language (OMG SysML) that factors in the models that correspond to the primary phases
of object-oriented development into producing an effort estimate. This dissertation
presents a Function Point-like approach, named Pattern Point, which was conceived to
estimate the size of object-oriented products using the design patterns found in object
interaction modeling from the late OO analysis phase. In particular, two measures are proposed (PP1 and PP2) that are theoretically validated showing that they satisfy wellknown
properties necessary for size measures.
An initial empirical validation is performed that is meant to assess the usefulness
and effectiveness of the proposed measures in predicting the development effort of
object-oriented systems. Moreover, a comparative analysis is carried out; taking into
account several other size measures. The experimental results show that the Pattern Point
measure can be effectively used during the OOA phase to predict the effort values with a
high degree of confidence. The PP2 metric yielded the best results with an aggregate
PRED (0.25) = 0.874
Productivity prediction model based on Bayesian analysis and productivity console
Software project management is one of the most critical activities in modern software
development projects. Without realistic and objective management, the software development
process cannot be managed in an effective way. There are three general
problems in project management: effort estimation is not accurate, actual status is
difficult to understand, and projects are often geographically dispersed. Estimating
software development effort is one of the most challenging problems in project
management. Various attempts have been made to solve the problem; so far, however,
it remains a complex problem. The error rate of a renowned effort estimation
model can be higher than 30% of the actual productivity. Therefore, inaccurate estimation
results in poor planning and defies effective control of time and budgets in
project management. In this research, we have built a productivity prediction model
which uses productivity data from an ongoing project to reevaluate the initial productivity
estimate and provides managers a better productivity estimate for project
management. The actual status of the software project is not easy to understand
due to problems inherent in software project attributes. The project attributes are
dispersed across the various CASE (Computer-Aided Software Engineering) tools and
are difficult to measure because they are not hard material like building blocks. In
this research, we have created a productivity console which incorporates an expert
system to measure project attributes objectively and provides graphical charts to
visualize project status. The productivity console uses project attributes gathered
in KB (Knowledge Base) of PAMPA II (Project Attributes Monitoring and Prediction
Associate) that works with CASE tools and collects project attributes from the
databases of the tools. The productivity console and PAMPA II work on a network, so
geographically dispersed projects can be managed via the Internet without difficulty