Search CORE

117,706 research outputs found

Optimization of fuzzy analogy in software cost estimation using linguistic variables

Author: Malathi S.
Sridhar S.
Publication venue
Publication date: 12/09/2012
Field of study

One of the most important objectives of software engineering community has been the increase of useful models that beneficially explain the development of life cycle and precisely calculate the effort of software cost estimation. In analogy concept, there is deficiency in handling the datasets containing categorical variables though there are innumerable methods to estimate the cost. Due to the nature of software engineering domain, generally project attributes are often measured in terms of linguistic values such as very low, low, high and very high. The imprecise nature of such value represents the uncertainty and vagueness in their elucidation. However, there is no efficient method that can directly deal with the categorical variables and tolerate such imprecision and uncertainty without taking the classical intervals and numeric value approaches. In this paper, a new approach for optimization based on fuzzy logic, linguistic quantifiers and analogy based reasoning is proposed to improve the performance of the effort in software project when they are described in either numerical or categorical data. The performance of this proposed method exemplifies a pragmatic validation based on the historical NASA dataset. The results were analyzed using the prediction criterion and indicates that the proposed method can produce more explainable results than other machine learning methods.Comment: 14 pages, 8 figures; Journal of Systems and Software, 2011. arXiv admin note: text overlap with arXiv:1112.3877 by other author

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Author: Albrecht
Austin
Baird
Batista
Boehm
Boehm
Breiman
Briand
Briand
Briand
Brockmeier
Cartwright
Cheung
Clark
Feelders
Finnie
Gama
Gray
Holte
Jain
Jeffery
Jun Liu
Jönsson
Kemerer
Khotanzad
Kibler
Kim
Kitchenham
Kohavi
Little
Little
Little
Little
Little
Martin Shepperd
Miranda
Myrtveit
Pickard
Putnam
Qinbao Song
Quinlan
Robins
Rubin
Rubin
Rubin
Rubin
Samson
Selby
Shao
Shepperd
Shepperd
Siedelecki
Song
Song
Srinivasan
Strike
Tabachnick
Tay
Walkerden
Walston
Xiangru Chen
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

Crossref

Brunel University Research Archive

Transfer Learning for Improving Model Predictions in Highly Configurable Software

Author: Jamshidi Pooyan
Kawthekar Prasad
Kästner Christian
Siegmund Norbert
Velez Miguel
Publication venue
Publication date: 20/04/2017
Field of study

Modern software systems are built to be used in dynamic environments using configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost. We define a cost model that transform the traditional view of model learning into a multi-objective problem that not only takes into account model accuracy but also measurements effort as well. We evaluate our cost-aware transfer learning solution using real-world configurable software including (i) a robotic system, (ii) 3 different stream processing applications, and (iii) a NoSQL database system. The experimental results demonstrate that our approach can achieve (a) a high prediction accuracy, as well as (b) a high model reliability.Comment: To be published in the proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS'17

arXiv.org e-Print Archive

Crossref