13 research outputs found

    Making inferences with small numbers of training sets

    Get PDF
    A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency

    Quantum Software Analytics: Opportunities and Challenges

    Full text link
    Quantum computing systems depend on the principles of quantum mechanics to perform multiple challenging tasks more efficiently than their classical counterparts. In classical software engineering, the software life cycle is used to document and structure the processes of design, implementation, and maintenance of software applications. It helps stakeholders understand how to build an application. In this paper, we summarize a set of software analytics topics and techniques in the development life cycle that can be leveraged and integrated into quantum software application development. The results of this work can assist researchers and practitioners in better understanding the quantum-specific emerging development activities, challenges, and opportunities in the next generation of quantum software

    Software maintenance cost estimation with fourth generation languages

    Get PDF
    This thesis addresses the problem of allocation of software maintenance resources in a commercial environment using fourth generation language systems. The activity of maintaining software has a poor image amongst software managers, as it often appears that there is no end product. This image will only improve when software maintenance can be discussed in business terms, one of the main reasons being that the maintenance costs can then be compared to the costs of not maintaining the system. Software maintenance will continue to exist in the fourth generation environment, as systems will still be required to evolve. Cost estimation is an imprecise science, as there are many variables such as human, technical, environmental and political which can effect the ultimate costs of software and the resources required to maintain it. Some of the factors appear more obvious than others, for example an experienced programmer can achieve a specific task in less time than an inexperienced one. To fully estimate software maintenance costs these factors need to be identified and weights assigned to them. This thesis examines a means to identify these factors and their weights, and produces the first cut of an equation which will enable the software maintenance resources in a fourth generation language to be estimated

    An Empirical investigation into software effort estimation by analogy

    Get PDF
    Most practitioners recognise the important part accurate estimates of development effort play in the successful management of major software projects. However, it is widely recognised that current estimation techniques are often very inaccurate, while studies (Heemstra 1992; Lederer and Prasad 1993) have shown that effort estimation research is not being effectively transferred from the research domain into practical application. Traditionally, research has been almost exclusively focused on the advancement of algorithmic models (e.g. COCOMO (Boehm 1981) and SLIM (Putnam 1978)), where effort is commonly expressed as a function of system size. However, in recent years there has been a discernible movement away from algorithmic models with non-algorithmic systems (often encompassing machine learning facets) being actively researched. This is potentially a very exciting and important time in this field, with new approaches regularly being proposed. One such technique, estimation by analogy, is the focus of this thesis. The principle behind estimation by analogy is that past experience can often provide insights and solutions to present problems. Software projects are characterised in terms of collectable features (such as the number of screens or the size of the functional requirements) and stored in a historical case base as they are completed. Once a case base of sufficient size has been cultivated, new projects can be estimated by finding similar historical projects and re-using the recorded effort. To make estimation by analogy feasible it became necessary to construct a software tool, dubbed ANGEL, which allowed the collection of historical project data and the generation of estimates for new software projects. A substantial empirical validation of the approach was made encompassing approximately 250 real historical software projects across eight industrial data sets, using stepwise regression as a benchmark. Significance tests on the results accepted the hypothesis (at the 1% confidence level) that estimation by analogy is a superior prediction system to stepwise regression in terms of accuracy. A study was also made of the sensitivity of the analogy approach. By growing project data sets in a pseudo time-series fashion it was possible to answer pertinent questions about the approach, such as, what are the effects of outlying projects and what is the minimum data set size? The main conclusions of this work are that estimation by analogy is a viable estimation technique that would seem to offer some advantages over algorithmic approaches including, improved accuracy, easier use of categorical features and an ability to operate even where no statistical relationships can be found

    Object-oriented software development effort prediction using design patterns from object interaction analysis

    Get PDF
    Software project management is arguably the most important activity in modern software development projects. In the absence of realistic and objective management, the software development process cannot be managed in an effective way. Software development effort estimation is one of the most challenging and researched problems in project management. With the advent of object-oriented development, there have been studies to transpose some of the existing effort estimation methodologies to the new development paradigm. However, there is not in existence a holistic approach to estimation that allows for the refinement of an initial estimate produced in the requirements gathering phase through to the design phase. A SysML point methodology is proposed that is based on a common, structured and comprehensive modeling language (OMG SysML) that factors in the models that correspond to the primary phases of object-oriented development into producing an effort estimate. This dissertation presents a Function Point-like approach, named Pattern Point, which was conceived to estimate the size of object-oriented products using the design patterns found in object interaction modeling from the late OO analysis phase. In particular, two measures are proposed (PP1 and PP2) that are theoretically validated showing that they satisfy wellknown properties necessary for size measures. An initial empirical validation is performed that is meant to assess the usefulness and effectiveness of the proposed measures in predicting the development effort of object-oriented systems. Moreover, a comparative analysis is carried out; taking into account several other size measures. The experimental results show that the Pattern Point measure can be effectively used during the OOA phase to predict the effort values with a high degree of confidence. The PP2 metric yielded the best results with an aggregate PRED (0.25) = 0.874

    Productivity prediction model based on Bayesian analysis and productivity console

    Get PDF
    Software project management is one of the most critical activities in modern software development projects. Without realistic and objective management, the software development process cannot be managed in an effective way. There are three general problems in project management: effort estimation is not accurate, actual status is difficult to understand, and projects are often geographically dispersed. Estimating software development effort is one of the most challenging problems in project management. Various attempts have been made to solve the problem; so far, however, it remains a complex problem. The error rate of a renowned effort estimation model can be higher than 30% of the actual productivity. Therefore, inaccurate estimation results in poor planning and defies effective control of time and budgets in project management. In this research, we have built a productivity prediction model which uses productivity data from an ongoing project to reevaluate the initial productivity estimate and provides managers a better productivity estimate for project management. The actual status of the software project is not easy to understand due to problems inherent in software project attributes. The project attributes are dispersed across the various CASE (Computer-Aided Software Engineering) tools and are difficult to measure because they are not hard material like building blocks. In this research, we have created a productivity console which incorporates an expert system to measure project attributes objectively and provides graphical charts to visualize project status. The productivity console uses project attributes gathered in KB (Knowledge Base) of PAMPA II (Project Attributes Monitoring and Prediction Associate) that works with CASE tools and collects project attributes from the databases of the tools. The productivity console and PAMPA II work on a network, so geographically dispersed projects can be managed via the Internet without difficulty