4 research outputs found

    Do Base Functional Component types affect the relationship between software functional size and effort?

    No full text
    One of the most debated issues in Software Engineering is effort estimation and one of the main points is about which could be (and how many) the right data from an historical database to use in order to obtain reliable estimates. In many of these studies, software size (measured in either lines of code or functional size units) is the primary input. However, the relationship between effort and the components of functional size (BFC - Base Functional Components) has not yet been fully analyzed. This study explores whether effort estimation models based on BFCs types, rather than those based oil a single total value, would improve estimation models. For this empirical study, the project data in the International Software Benchmarking Standards Group (ISBSG) Release 10 dataset, which were functionally sized by the COSMIC FFP method, are used

    Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review

    Full text link
    Context The International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6000 software projects. This dataset makes it possible to estimate a project s size, effort, duration, and cost. Objective The aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published, until June of 2012. Method A systematic mapping review was used as the research method, which was applied to over 129 papers obtained after the filtering process. Results The papers were published in 19 journals and 40 conferences. Thirty-five percent of the papers published between years 2000 and 2011 have received at least one citation in journals and only five papers have received six or more citations. Effort variable is the focus of 70.5% of the papers, 22.5% center their research in a variable different from effort and 7% do not consider any target variable. Additionally, in as many as 70.5% of papers, effort estimation is the research topic, followed by dataset properties (36.4%). The more frequent methods are Regression (61.2%), Machine Learning (35.7%), and Estimation by Analogy (22.5%). ISBSG is used as the only support in 55% of the papers while the remaining papers use complementary datasets. The ISBSG release 10 is used most frequently with 32 references. Finally, some benefits and drawbacks of the usage of ISBSG have been highlighted. Conclusion This work presents a snapshot of the existing usage of ISBSG in software development research. ISBSG offers a wealth of information regarding practices from a wide range of organizations, applications, and development types, which constitutes its main potential. However, a data preparation process is required before any analysis. Lastly, the potential of ISBSG to develop new research is also outlined.Fernández Diego, M.; González-Ladrón-De-Guevara, F. (2014). Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review. Information and Software Technology. 56(6):527-544. doi:10.1016/j.infsof.2014.01.003S52754456

    The usage of ISBSG data fields in software effort estimation: A systematic mapping study

    Full text link
    [EN] The International Software Benchmarking Standards Group (ISBSG) maintains a repository of data about completed software projects. A common use of the ISBSG dataset is to investigate models to estimate a software project's size, effort, duration, and cost. The aim of this paper is to determine which and to what extent variables in the ISBSG dataset have been used in software engineering to build effort estimation models. For that purpose a systematic mapping study was applied to 107 research papers, obtained after a filtering process, that were published from 2000 until the end of 2013, and which listed the independent variables used in the effort estimation models. The usage of ISBSG variables for filtering, as dependent variables, and as independent variables is described. The 20 variables (out of 71) mostly used as independent variables for effort estimation are identified and analysed in detail, with reference to the papers and types of estimation methods that used them. We propose guidelines that can help researchers make informed decisions about which ISBSG variables to select for their effort estimation models.González-Ladrón-De-Guevara, F.; Fernández-Diego, M.; Lokan, C. (2016). The usage of ISBSG data fields in software effort estimation: A systematic mapping study. Journal of Systems and Software. 113:188-215. doi:10.1016/j.jss.2015.11.040S18821511

    Data cleaning techniques for software engineering data sets

    Get PDF
    Data quality is an important issue which has been addressed and recognised in research communities such as data warehousing, data mining and information systems. It has been agreed that poor data quality will impact the quality of results of analyses and that it will therefore impact on decisions made on the basis of these results. Empirical software engineering has neglected the issue of data quality to some extent. This fact poses the question of how researchers in empirical software engineering can trust their results without addressing the quality of the analysed data. One widely accepted definition for data quality describes it as `fitness for purpose', and the issue of poor data quality can be addressed by either introducing preventative measures or by applying means to cope with data quality issues. The research presented in this thesis addresses the latter with the special focus on noise handling. Three noise handling techniques, which utilise decision trees, are proposed for application to software engineering data sets. Each technique represents a noise handling approach: robust filtering, where training and test sets are the same; predictive filtering, where training and test sets are different; and filtering and polish, where noisy instances are corrected. The techniques were first evaluated in two different investigations by applying them to a large real world software engineering data set. In the first investigation the techniques' ability to improve predictive accuracy in differing noise levels was tested. All three techniques improved predictive accuracy in comparison to the do-nothing approach. The filtering and polish was the most successful technique in improving predictive accuracy. The second investigation utilising the large real world software engineering data set tested the techniques' ability to identify instances with implausible values. These instances were flagged for the purpose of evaluation before applying the three techniques. Robust filtering and predictive filtering decreased the number of instances with implausible values, but substantially decreased the size of the data set too. The filtering and polish technique actually increased the number of implausible values, but it did not reduce the size of the data set. Since the data set contained historical software project data, it was not possible to know the real extent of noise detected. This led to the production of simulated software engineering data sets, which were modelled on the real data set used in the previous evaluations to ensure domain specific characteristics. These simulated versions of the data set were then injected with noise, such that the real extent of the noise was known. After the noise injection the three noise handling techniques were applied to allow evaluation. This procedure of simulating software engineering data sets combined the incorporation of domain specific characteristics of the real world with the control over the simulated data. This is seen as a special strength of this evaluation approach. The results of the evaluation of the simulation showed that none of the techniques performed well. Robust filtering and filtering and polish performed very poorly, and based on the results of this evaluation they would not be recommended for the task of noise reduction. The predictive filtering technique was the best performing technique in this evaluation, but it did not perform significantly well either. An exhaustive systematic literature review has been carried out investigating to what extent the empirical software engineering community has considered data quality. The findings showed that the issue of data quality has been largely neglected by the empirical software engineering community. The work in this thesis highlights an important gap in empirical software engineering. It provided clarification and distinctions of the terms noise and outliers. Noise and outliers are overlapping, but they are fundamentally different. Since noise and outliers are often treated the same in noise handling techniques, a clarification of the two terms was necessary. To investigate the capabilities of noise handling techniques a single investigation was deemed as insufficient. The reasons for this are that the distinction between noise and outliers is not trivial, and that the investigated noise cleaning techniques are derived from traditional noise handling techniques where noise and outliers are combined. Therefore three investigations were undertaken to assess the effectiveness of the three presented noise handling techniques. Each investigation should be seen as a part of a multi-pronged approach. This thesis also highlights possible shortcomings of current automated noise handling techniques. The poor performance of the three techniques led to the conclusion that noise handling should be integrated into a data cleaning process where the input of domain knowledge and the replicability of the data cleaning process are ensured.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore