33,431 research outputs found
Fuzzy C-mean missing data imputation for analogy-based effort estimation
The accuracy of effort estimation in one of the major factors in the success or failure of software projects. Analogy-Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project. The accuracy of prediction in ABE model in strongly associated with the quality of the dataset since it depends on previous completed projects for estimation. Missing Data (MD) is one of major challenges in software engineering datasets. Several missing data imputation techniques have been investigated by researchers in ABE model. Identification of the most similar donor values from the completed software projects dataset for imputation is a challenging issue in existing missing data techniques adopted for ABE model. In this study, Fuzzy C-Mean Imputation (FCMI), Mean Imputation (MI) and K-Nearest Neighbor Imputation (KNNI) are investigated to impute missing values in Desharnais dataset under different missing data percentages (Desh-Miss1, Desh-Miss2) for ABE model. FCMI-ABE technique is proposed in this study. Evaluation comparison among MI, KNNI, and (ABE-FCMI) is conducted for ABE model to identify the suitable MD imputation method. The results suggest that the use of (ABE-FCMI), rather than MI and KNNI, imputes more reliable values to incomplete software projects in the missing datasets. It was also found that the proposed imputation method significantly improves software development effort prediction of ABE model
Optimization of fuzzy analogy in software cost estimation using linguistic variables
One of the most important objectives of software engineering community has
been the increase of useful models that beneficially explain the development of
life cycle and precisely calculate the effort of software cost estimation. In
analogy concept, there is deficiency in handling the datasets containing
categorical variables though there are innumerable methods to estimate the
cost. Due to the nature of software engineering domain, generally project
attributes are often measured in terms of linguistic values such as very low,
low, high and very high. The imprecise nature of such value represents the
uncertainty and vagueness in their elucidation. However, there is no efficient
method that can directly deal with the categorical variables and tolerate such
imprecision and uncertainty without taking the classical intervals and numeric
value approaches. In this paper, a new approach for optimization based on fuzzy
logic, linguistic quantifiers and analogy based reasoning is proposed to
improve the performance of the effort in software project when they are
described in either numerical or categorical data. The performance of this
proposed method exemplifies a pragmatic validation based on the historical NASA
dataset. The results were analyzed using the prediction criterion and indicates
that the proposed method can produce more explainable results than other
machine learning methods.Comment: 14 pages, 8 figures; Journal of Systems and Software, 2011. arXiv
admin note: text overlap with arXiv:1112.3877 by other author
The consistency of empirical comparisons of regression and analogy-based software project cost prediction
OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS – our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS – we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: “What is the best prediction system?
Recommended from our members
Estimating software project effort using analogies
Accurate project effort prediction is an important goal for the software engineering community. To date most work has focused upon building algorithmic models of effort, for example COCOMO. These can be calibrated to local environments. We describe an alternative approach to estimation based upon the use of analogies. The underlying principle is to characterise projects in terms of features (for example, the number of interfaces, the development method or the size of the functional requirements document). Completed projects are stored and then the problem becomes one of finding the most similar projects to the one for which a prediction is required. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project features. Each dimension is standardised so all dimensions have equal weight. The known effort values of the nearest neighbours to the new project are then used as the basis for the prediction. The process is automated using a PC based tool known as ANGEL. The method is validated on nine different industrial datasets (a total of 275 projects) and in all cases analogy outperforms algorithmic models based upon stepwise regression. From this work we argue that estimation by analogy is a viable technique that, at the very least, can be used by project managers to complement current estimation techniques
Investigating effort prediction of web-based applications using CBR on the ISBSG dataset
As web-based applications become more popular and more sophisticated, so does the requirement for early accurate estimates of the effort required to build such systems. Case-based reasoning (CBR) has been shown to be a reasonably effective estimation strategy, although it has not been widely explored in the context of web applications. This paper reports on a study carried out on a subset of the ISBSG dataset to examine the optimal number of analogies that should be used in making a prediction. The results show that it is not possible to select such a value with confidence, and that, in common with other findings in different domains, the effectiveness of CBR is hampered by other factors including the characteristics of the underlying dataset (such as the spread of data and presence of outliers) and the calculation employed to evaluate the distance function (in particular, the treatment of numeric and categorical data)
Predicting software project effort: A grey relational analysis based method
This is the post-print version of the final paper published in Expert Systems with Applications. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2011 Elsevier B.V.The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, outlier detection, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly focus on outlier detection, feature subset selection, and effort prediction at an early stage of a project. We propose a novel approach of using grey relational analysis (GRA) from grey system theory (GST), which is a recently developed system engineering theory based on the uncertainty of small samples. In this work we address some of the theoretical challenges in applying GRA to outlier detection, feature subset selection, and effort prediction, and then evaluate our approach on five publicly available industrial data sets using both stepwise regression and Analogy as benchmarks. The results are very encouraging in the sense of being comparable or better than other machine learning techniques and thus indicate that the method has considerable potential.National Natural Science Foundation
of Chin
Reliability and validity in comparative studies of software prediction models
Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models
- …