Search CORE

89 research outputs found

Making inferences with small numbers of training sets

Author: Albrecht
Boehm
C. Kirsopp
Ebert
Efron
Kadoda
Kemerer
Kitchenham
Kitchenham
Kok
M. Shepperd
MacDonell
Mair
Miyazaki
Shepperd
Shepperd
Srinivasan
Walston
Wittig
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2002
Field of study

A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency

CiteSeerX

Crossref

Brunel University Research Archive

Measuring the impact of computer resource quality on the software development process and product

Author: Hall Dana
Mcgarry Frank
Valett Jon
Publication venue
Publication date
Field of study

The availability and quality of computer resources during the software development process was speculated to have measurable, significant impact on the efficiency of the development process and the quality of the resulting product. Environment components such as the types of tools, machine responsiveness, and quantity of direct access storage may play a major role in the effort to produce the product and in its subsequent quality as measured by factors such as reliability and ease of maintenance. During the past six years, the NASA Goddard Space Flight Center has conducted experiments with software projects in an attempt to better understand the impact of software development methodologies, environments, and general technologies on the software process and product. Data was extracted and examined from nearly 50 software development projects. All were related to support of satellite flight dynamics ground-based computations. The relationship between computer resources and the software development process and product as exemplified by the subject NASA data was examined. Based upon the results, a number of computer resource-related implications are provided

NASA Technical Reports Server

Calculation and use of an environment's characteristic software metric set

Author: Basili Victor R.
Selby Richard W., Jr.
Publication venue
Publication date
Field of study

Since both cost/quality and production environments differ, this study presents an approach for customizing a characteristic set of software metrics to an environment. The approach is applied in the Software Engineering Laboratory (SEL), a NASA Goddard production environment, to 49 candidate process and product metrics of 652 modules from six (51,000 to 112,000 lines) projects. For this particular environment, the method yielded the characteristic metric set (source lines, fault correction effort per executable statement, design effort, code effort, number of I/O parameters, number of versions). The uses examined for a characteristic metric set include forecasting the effort for development, modification, and fault correction of modules based on historical data

NASA Technical Reports Server

Monitoring software development through dynamic variables

Author: Basili Victor R.
Doerflinger Carl W.
Publication venue
Publication date
Field of study

Research conducted by the Software Engineering Laboratory (SEL) on the use of dynamic variables as a tool to monitor software development is described. Project independent measures which may be used in a management tool for monitoring software development are identified. Several FORTRAN projects with similar profiles are examined. The staff was experienced in developing these types of projects. The projects developed serve similar functions. Because these projects are similar some underlying relationships exist that are invariant between projects. These relationships, once well defined, may be used to compare the development of different projects to determine whether they are evolving the same way previous projects in this environment evolved

NASA Technical Reports Server

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Author: Albrecht
Austin
Baird
Batista
Boehm
Boehm
Breiman
Briand
Briand
Briand
Brockmeier
Cartwright
Cheung
Clark
Feelders
Finnie
Gama
Gray
Holte
Jain
Jeffery
Jun Liu
Jönsson
Kemerer
Khotanzad
Kibler
Kim
Kitchenham
Kohavi
Little
Little
Little
Little
Little
Martin Shepperd
Miranda
Myrtveit
Pickard
Putnam
Qinbao Song
Quinlan
Robins
Rubin
Rubin
Rubin
Rubin
Samson
Selby
Shao
Shepperd
Shepperd
Siedelecki
Song
Song
Srinivasan
Strike
Tabachnick
Tay
Walkerden
Walston
Xiangru Chen
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

Crossref

Brunel University Research Archive

Software development: A paradigm for the future

Author: Basili Victor R.
Publication venue
Publication date: 01/06/1989
Field of study

A new paradigm for software development that treats software development as an experimental activity is presented. It provides built-in mechanisms for learning how to develop software better and reusing previous experience in the forms of knowledge, processes, and products. It uses models and measures to aid in the tasks of characterization, evaluation and motivation. An organization scheme is proposed for separating the project-specific focus from the organization's learning and reuse focuses of software development. The implications of this approach for corporations, research and education are discussed and some research activities currently underway at the University of Maryland that support this approach are presented

NASA Technical Reports Server

Digital Repository at the University of Maryland

PROPERTY DEVELOPMENT:AN EVALUATION OF DECISION SUPPORT SYSTEMS

Author: Pastor-Urban Jose Luis
Powell Philip
Publication venue: AIS Electronic Library (AISeL)
Publication date: 15/08/1997
Field of study

Although there are a number of development approaches proposed in the decision support systems (DSS) literature, there appears to be a preference for prototyping over structured approaches. This paper describes the analysis stage of a DSS for the Property Development Department (PDS) at the Palmerston North City Council, New Zealand. The PDS role involves many ill-structured decisions with a large number of stakeholders. The paper describes the selection process for the methodology, analysing criteria for selection, and proposes a structured process for this analysis. The paper provides insights into when structured approaches are more appropriate for the development of DSS based on the type and complexity of the decisions supporte

AIS Electronic Library (AISeL)

REBEE- Reusability Based Effort Estimation Technique using Dynamic Neural Network

Author: Devanand
Jyoti Mahajan
Kashyap Dhruve
Publication venue: Global Journals Inc. (US)
Publication date: 01/04/2011
Field of study

Software Effort Estimation has been researched for over 25 years but until today no real effective model could be designed that could efficiently gauge the effort required for heterogeneous project data. Reusability factors of software development have been used to design a new effort estimation model called REBEE. This encompasses the usage of Fuzzy Logic and Dynamic Neural Networks. The experimental evaluation of the model depicts efficient effort estimation over varied project types

Global Journal of Computer Science and Technology (GJCST)

An Approach for Effort Estimation having Reusable Components in Software Development

Author: Jyoti Mahajan
Publication venue: Global Journals Inc. (US)
Publication date: 14/11/2011
Field of study

Estimation of the effort required for development has been researched for over 25 years now. Still there exists no concrete solution to estimate the development effort. Prior experience in similar type of projects is a key for business today. This paper proposes an Effort Estimation Model named REBEE based on the reusable matrices to effectively estimate the effort to be involved for development. A project is assumed to consist of multiple modules and the reusability factor of each module is considered in the technique described here. REBEE utilizes fuzzy logic and dynamic neural networks to achieve its goal. Based on the experimental evaluation discussed in this paper it is evident that this model accurately predicts the effort involved on heterogeneous project types

Global Journal of Computer Science and Technology (GJCST)

A Theory Of Small Program Complexity

Author: Magel Kenneth I.
Publication venue: Scholars\u27 Mine
Publication date: 03/01/1982
Field of study

Small programs are those which are written and understood by one. person. Large software systems usually consist of many small programs. The complexity of a small program is a prediction of how difficult it would be for someone to understand the program. This complexity depends of three factors: (1) the size and interelationships of the program itself; (2) the size and interelationships of the internal model of the program\u27s purpose held by the person trying to understand the program; and (3) the complexity of the mapping between the model and the program. A theory of small program complexity based on these three factors is presented. The theory leads to several testable predictions. Experiments are described which test these predictions and whose results could verify or destroy the theory. © 1982, ACM. All rights reserved

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine