42 research outputs found
Comparing software prediction techniques using simulation
The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system
An investigation of machine learning based prediction systems
Traditionally, researchers have used either o�f-the-shelf models such as COCOMO, or developed local models using statistical techniques such as stepwise regression, to obtain software eff�ort estimates. More recently, attention has turned to a variety of machine learning methods such as artifcial neural networks (ANNs), case-based reasoning (CBR) and rule induction (RI). This paper outlines some comparative research into the use of these three machine learning methods to build software e�ort prediction
systems. We briefly describe each method and then apply the techniques to a dataset of 81 software projects derived from a Canadian software house in the late 1980s. We compare the prediction systems in terms of three factors: accuracy, explanatory value and configurability. We show that ANN methods have superior accuracy and that RI methods are least accurate. However, this view is somewhat counteracted by problems with explanatory value and configurability. For example, we found that considerable
eff�ort was required to configure the ANN and that this compared very unfavourably with the other techniques, particularly CBR and least squares regression (LSR). We suggest that further work be carried out, both to further explore interaction between the enduser and the prediction system, and also to facilitate configuration, particularly of ANNs
Making inferences with small numbers of training sets
A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency
Predicting software project effort: A grey relational analysis based method
This is the post-print version of the final paper published in Expert Systems with Applications. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2011 Elsevier B.V.The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, outlier detection, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly focus on outlier detection, feature subset selection, and effort prediction at an early stage of a project. We propose a novel approach of using grey relational analysis (GRA) from grey system theory (GST), which is a recently developed system engineering theory based on the uncertainty of small samples. In this work we address some of the theoretical challenges in applying GRA to outlier detection, feature subset selection, and effort prediction, and then evaluate our approach on five publicly available industrial data sets using both stepwise regression and Analogy as benchmarks. The results are very encouraging in the sense of being comparable or better than other machine learning techniques and thus indicate that the method has considerable potential.National Natural Science Foundation
of Chin
Formal software development tools: an investigation into usability
Formal methods are techniques that are firmly based in mathematics, they can be used
to specify and verify computer systems. Formal techniques offer many advantages,
including correctness and productivity over less formal ones. Wide acceptance of
these methods is hindered by their relatively difficult notations and theories. This
thesis takes the view that the availability of usable tools that support formal
techniques plays an important role in promoting their use by a wider community of
software engineers. [Continues.
Best practice lessons learnt through the exit interview and the Oral History Project at United Nations Mission in Sudan
The United Nations Mission in Sudan (UNMIS) operated in a vast, remote and difficult environment in the Republic of the Sudan from 2005 to 2011. The mandate of the mission was to ensure the Comprehensive Peace Agreement (CPA) signed in 2005 was adhered to and the best outcomes were achieved by both sides to the Sudanese conflict. The issues affecting the western Darfur region are a separate matter and they are dealt with by the UNAMID mission. As far as information and knowledge management is concerned there were many initiatives to ensure that we could address potential risks that operating in the Sudan context presented. DPKO and the UN in general have embraced the concepts of Web 2.0 technology. Social networking and file sharing sites have become de facto systems for many UN bodies. The Public Information Office of UNMIS routinely used Youtube, Twitter and Facebook to spread its information to the wider world. UNMIS had a high turnover of staff and for long periods there is no Best Practice officer based in UNMIS. To counter that trend is was decided to implement an oral history project consisting of videotaped exit interviews of departing staff and that provided a wide range of best practice and lessons learnt material that will add value to the ongoing operations of the mission in South Sudan, UNMISS. It was necessary to develop audiovisual metadata, a keyword thesaurus and video recording standards and guidelines as none previously existed that could be applied in the field