14 research outputs found

    Investigating effort prediction of web-based applications using CBR on the ISBSG dataset

    Get PDF
    As web-based applications become more popular and more sophisticated, so does the requirement for early accurate estimates of the effort required to build such systems. Case-based reasoning (CBR) has been shown to be a reasonably effective estimation strategy, although it has not been widely explored in the context of web applications. This paper reports on a study carried out on a subset of the ISBSG dataset to examine the optimal number of analogies that should be used in making a prediction. The results show that it is not possible to select such a value with confidence, and that, in common with other findings in different domains, the effectiveness of CBR is hampered by other factors including the characteristics of the underlying dataset (such as the spread of data and presence of outliers) and the calculation employed to evaluate the distance function (in particular, the treatment of numeric and categorical data)

    Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

    Get PDF
    Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

    Ensemble missing data techniques for software effort prediction

    Get PDF
    Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble

    Dataset Quality Assessment: An extension for analogy based effort estimation

    Get PDF
    Abstract Estimation by Analogy (EBA

    Customer Rating Reactions Can Be Predicted Purely Using App Features

    Get PDF
    In this paper we provide empirical evidence that the rating that an app attracts can be accurately predicted from the features it offers. Our results, based on an analysis of 11,537 apps from the Samsung Android and BlackBerry World app stores, indicate that the rating of 89% of these apps can be predicted with 100% accuracy. Our prediction model is built by using feature and rating information from the existing apps offered in the App Store and it yields highly accurate rating predictions, using only a few (11-12) existing apps for case-based prediction. These findings may have important implications for require- ments engineering in app stores: They indicate that app devel- opers may be able to obtain (very accurate) assessments of the customer reaction to their proposed feature sets (requirements), thereby providing new opportunities to support the requirements elicitation process for app developers

    Linear programming as a baseline for software effort estimation

    Get PDF
    Software effort estimation studies still suffer from discordant empirical results (i.e., conclusion instability) mainly due to the lack of rigorous benchmarking methods. So far only one baseline model, namely, Automatically Transformed Linear Model (ATLM), has been proposed yet it has not been extensively assessed. In this article, we propose a novel method based on Linear Programming (dubbed as Linear Programming for Effort Estimation, LP4EE) and carry out a thorough empirical study to evaluate the effectiveness of both LP4EE and ATLM for benchmarking widely used effort estimation techniques. The results of our study confirm the need to benchmark every other proposal against accurate and robust baselines. They also reveal that LP4EE is more accurate than ATLM for 17% of the experiments and more robust than ATLM against different data splits and cross-validation methods for 44% of the cases. These results suggest that using LP4EE as a baseline can help reduce conclusion instability. We make publicly available an open-source implementation of LP4EE in order to facilitate its adoption in future studies

    Linear Programming as a Baseline for Software Effort Estimation

    Get PDF
    Software effort estimation studies still suffer from discordant empirical results (i.e., conclusion instability) mainly due to the lack of rigorous benchmarking methods. So far only one baseline model, namely, Automatically Transformed Linear Model (ATLM), has been proposed yet it has not been extensively assessed. In this article, we propose a novel method based on Linear Programming (dubbed as Linear Programming for Effort Estimation, LP4EE) and carry out a thorough empirical study to evaluate the effectiveness of both LP4EE and ATLM for benchmarking widely used effort estimation techniques. The results of our study confirm the need to benchmark every other proposal against accurate and robust baselines. They also reveal that LP4EE is more accurate than ATLM for 17% of the experiments and more robust than ATLM against different data splits and cross-validation methods for 44% of the cases. These results suggest that using LP4EE as a baseline can help reduce conclusion instability. We make publicly available an open-source implementation of LP4EE in order to facilitate its adoption in future studies

    Pragmatic cost estimation for web applications

    Get PDF
    Cost estimation for web applications is an interesting and difficult challenge for researchers and industrial practitioners. It is a particularly valuable area of ongoing commercial research. Attaining on accurate cost estimation for web applications is an essential element in being able to provide competitive bids and remaining successful in the market. The development of prediction techniques over thirty years ago has contributed to several different strategies. Unfortunately there is no collective evidence to give substantial advice or guidance for industrial practitioners. Therefore to address this problem, this thesis shows the way by investigating the characteristics of the dataset by combining the literature review and industrial survey findings. The results of the systematic literature review, industrial survey and an initial investigation, have led to an understanding that dataset characteristics may influence the cost estimation prediction techniques. From this, an investigation was carried out on dataset characteristics. However, in the attempt to structure the characteristics of dataset it was found not to be practical or easy to get a defined structure of dataset characteristics to use as a basis for prediction model selection. Therefore the thesis develops a pragmatic cost estimation strategy based on collected advice and general sound practice in cost estimation. The strategy is composed of the following five steps: test whether the predictions are better than the means of the dataset; test the predictions using accuracy measures such as MMRE, Pred and MAE knowing their strengths and weaknesses; investigate the prediction models formed to see if they are sensible and reasonable model; perform significance testing on the predictions; and get the effect size to establish preference relations of prediction models. The results from this pragmatic cost estimation strategy give not only advice on several techniques to choose from, but also give reliable results. Practitioners can be more confident about the estimation that is given by following this pragmatic cost estimation strategy. It can be concluded that the practitioners should focus on the best strategy to apply in cost estimation rather than focusing on the best techniques. Therefore, this pragmatic cost estimation strategy could help researchers and practitioners to get reliable results. The improvement and replication of this strategy over time will produce much more useful and trusted results.Cost estimation for web applications is an interesting and difficult challenge for researchers and industrial practitioners. It is a particularly valuable area of ongoing commercial research. Attaining on accurate cost estimation for web applications is an essential element in being able to provide competitive bids and remaining successful in the market. The development of prediction techniques over thirty years ago has contributed to several different strategies. Unfortunately there is no collective evidence to give substantial advice or guidance for industrial practitioners. Therefore to address this problem, this thesis shows the way by investigating the characteristics of the dataset by combining the literature review and industrial survey findings. The results of the systematic literature review, industrial survey and an initial investigation, have led to an understanding that dataset characteristics may influence the cost estimation prediction techniques. From this, an investigation was carried out on dataset characteristics. However, in the attempt to structure the characteristics of dataset it was found not to be practical or easy to get a defined structure of dataset characteristics to use as a basis for prediction model selection. Therefore the thesis develops a pragmatic cost estimation strategy based on collected advice and general sound practice in cost estimation. The strategy is composed of the following five steps: test whether the predictions are better than the means of the dataset; test the predictions using accuracy measures such as MMRE, Pred and MAE knowing their strengths and weaknesses; investigate the prediction models formed to see if they are sensible and reasonable model; perform significance testing on the predictions; and get the effect size to establish preference relations of prediction models. The results from this pragmatic cost estimation strategy give not only advice on several techniques to choose from, but also give reliable results. Practitioners can be more confident about the estimation that is given by following this pragmatic cost estimation strategy. It can be concluded that the practitioners should focus on the best strategy to apply in cost estimation rather than focusing on the best techniques. Therefore, this pragmatic cost estimation strategy could help researchers and practitioners to get reliable results. The improvement and replication of this strategy over time will produce much more useful and trusted results

    Semantic Web Enabled Software Engineering

    Get PDF
    Ontologies allow the capture and sharing of domain knowledge by formalizing information and making it machine understandable. As part of an information system, ontologies can capture and carry the reasoning knowledge needed to fulfill different application goals. Although many ontologies have been developed over recent years, few include such reasoning information. As a result, many ontologies are not used in real-life applications, do not get reused or only act as a taxonomy of a domain. This work is an investigation into the practical use of ontologies as a driving factor in the development of applications and the incorporation of Knowledge Engineering as a meaningful activity into modern agile software development. This thesis contributes a novel methodology that supports an incremental requirement analysis and an iterative formalization of ontology design through the use of ontology reasoning patterns. It also provides an application model for ontology-driven applications that can deal with nonontological data sources. A set of case studies with various application specific goals helps to elucidate whether ontologies are in fact suitable for more than simple knowledge formalization and sharing, and can act as the underlying structure for developing largescale information systems. Tasks from the area of bug-tracker quality mining and clone detection are evaluated for this purpose
    corecore