Search CORE

9,864 research outputs found

Predicting and Evaluating Software Model Growth in the Automotive Industry

Author: Ali Mohammad
Berger Christian
Herpel Thomas
Knauss Alessia
Preenja Harri
Schroeder Jan
Staron Miroslaw
Publication venue
Publication date: 01/01/2017
Field of study

The size of a software artifact influences the software quality and impacts the development process. In industry, when software size exceeds certain thresholds, memory errors accumulate and development tools might not be able to cope anymore, resulting in a lengthy program start up times, failing builds, or memory problems at unpredictable times. Thus, foreseeing critical growth in software modules meets a high demand in industrial practice. Predicting the time when the size grows to the level where maintenance is needed prevents unexpected efforts and helps to spot problematic artifacts before they become critical. Although the amount of prediction approaches in literature is vast, it is unclear how well they fit with prerequisites and expectations from practice. In this paper, we perform an industrial case study at an automotive manufacturer to explore applicability and usability of prediction approaches in practice. In a first step, we collect the most relevant prediction approaches from literature, including both, approaches using statistics and machine learning. Furthermore, we elicit expectations towards predictions from practitioners using a survey and stakeholder workshops. At the same time, we measure software size of 48 software artifacts by mining four years of revision history, resulting in 4,547 data points. In the last step, we assess the applicability of state-of-the-art prediction approaches using the collected data by systematically analyzing how well they fulfill the practitioners' expectations. Our main contribution is a comparison of commonly used prediction approaches in a real world industrial setting while considering stakeholder expectations. We show that the approaches provide significantly different results regarding prediction accuracy and that the statistical approaches fit our data best

arXiv.org e-Print Archive

Crossref

Chalmers Research

Experience: Quality benchmarking of datasets used in software effort estimation

Author: Bosu Michael Franklin
MacDonell Stephen G.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2019
Field of study

Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location and severity of defects in code. Serious questions have been raised, however, over the quality of the data used in ESE. Data quality problems caused by noise, outliers, and incompleteness have been noted as being especially prevalent. Other quality issues, although also potentially important, have received less attention. In this study, we assess the quality of 13 datasets that have been used extensively in research on software effort estimation. The quality issues considered in this article draw on a taxonomy that we published previously based on a systematic mapping of data quality issues in ESE. Our contributions are as follows: (1) an evaluation of the “fitness for purpose” of these commonly used datasets and (2) an assessment of the utility of the taxonomy in terms of dataset benchmarking. We also propose a template that could be used to both improve the ESE data collection/submission process and to evaluate other such datasets, contributing to enhanced awareness of data quality issues in the ESE community and, in time, the availability and use of higher-quality datasets

arXiv.org e-Print Archive

Wintec Research Archive

Integrate the GM(1,1) and Verhulst models to predict software stage effort

Author: MacDonell S
Shen J
Shepperd M
Song Q
Wang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2009
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Software effort prediction clearly plays a crucial role in software project management. In keeping with more dynamic approaches to software development, it is not sufficient to only predict the whole-project effort at an early stage. Rather, the project manager must also dynamically predict the effort of different stages or activities during the software development process. This can assist the project manager to reestimate effort and adjust the project plan, thus avoiding effort or schedule overruns. This paper presents a method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst. This method establishes models dynamically according to particular types of stage-effort sequences, and can adapt to particular development methodologies automatically by using a novel grey feedback mechanism. We evaluate the proposed method with a large-scale real-world software engineering dataset, and compare it with the linear regression method and the Kalman filter method, revealing that accuracy has been improved by at least 28% and 50%, respectively. The results indicate that the method can be effective and has considerable potential. We believe that stage predictions could be a useful complement to whole-project effort prediction methods.National Natural Science Foundation of China and the Hi-Tech Research and Development Program of Chin

Crossref

AUT Scholarly Commons

Brunel University Research Archive

Recommended from our members

Data cleaning techniques for software engineering data sets

Author: Liebchen Gernot Armin
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Data quality is an important issue which has been addressed and recognised in research communities such as data warehousing, data mining and information systems. It has been agreed that poor data quality will impact the quality of results of analyses and that it will therefore impact on decisions made on the basis of these results. Empirical software engineering has neglected the issue of data quality to some extent. This fact poses the question of how researchers in empirical software engineering can trust their results without addressing the quality of the analysed data. One widely accepted definition for data quality describes it as `fitness for purpose', and the issue of poor data quality can be addressed by either introducing preventative measures or by applying means to cope with data quality issues. The research presented in this thesis addresses the latter with the special focus on noise handling. Three noise handling techniques, which utilise decision trees, are proposed for application to software engineering data sets. Each technique represents a noise handling approach: robust filtering, where training and test sets are the same; predictive filtering, where training and test sets are different; and filtering and polish, where noisy instances are corrected. The techniques were first evaluated in two different investigations by applying them to a large real world software engineering data set. In the first investigation the techniques' ability to improve predictive accuracy in differing noise levels was tested. All three techniques improved predictive accuracy in comparison to the do-nothing approach. The filtering and polish was the most successful technique in improving predictive accuracy. The second investigation utilising the large real world software engineering data set tested the techniques' ability to identify instances with implausible values. These instances were flagged for the purpose of evaluation before applying the three techniques. Robust filtering and predictive filtering decreased the number of instances with implausible values, but substantially decreased the size of the data set too. The filtering and polish technique actually increased the number of implausible values, but it did not reduce the size of the data set. Since the data set contained historical software project data, it was not possible to know the real extent of noise detected. This led to the production of simulated software engineering data sets, which were modelled on the real data set used in the previous evaluations to ensure domain specific characteristics. These simulated versions of the data set were then injected with noise, such that the real extent of the noise was known. After the noise injection the three noise handling techniques were applied to allow evaluation. This procedure of simulating software engineering data sets combined the incorporation of domain specific characteristics of the real world with the control over the simulated data. This is seen as a special strength of this evaluation approach. The results of the evaluation of the simulation showed that none of the techniques performed well. Robust filtering and filtering and polish performed very poorly, and based on the results of this evaluation they would not be recommended for the task of noise reduction. The predictive filtering technique was the best performing technique in this evaluation, but it did not perform significantly well either. An exhaustive systematic literature review has been carried out investigating to what extent the empirical software engineering community has considered data quality. The findings showed that the issue of data quality has been largely neglected by the empirical software engineering community. The work in this thesis highlights an important gap in empirical software engineering. It provided clarification and distinctions of the terms noise and outliers. Noise and outliers are overlapping, but they are fundamentally different. Since noise and outliers are often treated the same in noise handling techniques, a clarification of the two terms was necessary. To investigate the capabilities of noise handling techniques a single investigation was deemed as insufficient. The reasons for this are that the distinction between noise and outliers is not trivial, and that the investigated noise cleaning techniques are derived from traditional noise handling techniques where noise and outliers are combined. Therefore three investigations were undertaken to assess the effectiveness of the three presented noise handling techniques. Each investigation should be seen as a part of a multi-pronged approach. This thesis also highlights possible shortcomings of current automated noise handling techniques. The poor performance of the three techniques led to the conclusion that noise handling should be integrated into a data cleaning process where the input of domain knowledge and the replicability of the data cleaning process are ensured

Brunel University Research Archive

Insights on Research Techniques towards Cost Estimation in Software Design

Author: Naik Praveen
Nayak Shantaram
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2017
Field of study

Software cost estimation is of the most challenging task in project management in order to ensuring smoother development operation and target achievement. There has been evolution of various standards tools and techniques for cost estimation practiced in the industry at present times. However, it was never investigated about the overall picturization of effectiveness of such techniques till date. This paper initiates its contribution by presenting taxonomies of conventional cost-estimation techniques and then investigates the research trends towards frequently addressed problems in it. The paper also reviews the existing techniques in well-structured manner in order to highlight the problems addressed, techniques used, advantages associated and limitation explored from literatures. Finally, we also brief the explored open research issues as an added contribution to this manuscript

IAES journal

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Strategic and Operational Management of Supplier Involvement in New Product Development: a Contingency Perspective

Author: Echtelt F.E.A. van
Weele A.J. van
Wynstra J.Y.F.
Publication venue
Publication date
Field of study

This paper examines how firms succeed to leverage supplier involvement in product development. The paper extends earlier work on managing supplier involvement by providing an integrated analysis of results, processes and conditions both at the level of individual development projects and the overall firm. Following a multiple-case study approach with theoretical sampling, the study is carried out by examining eight projects in which four manufacturers from different industries involve multiple suppliers. The findings suggest that successful supplier involvement is dependent on the coordinated design, execution and evaluation of strategic, long-term processes and operational, short-term management processes and the presence of enabling factors such as a cross-functional oriented organization. The required intensity of these processes and enablers depends on contingencies such as firm size and environmental uncertainty. In contrast with previous research, we find no indications that managing supplier involvement requires a different approach in highly innovative projects compared to less innovative projects.innovation;new product development;purchasing;supplier relations;R&D management

Research Papers in Economics

Opinion mining with the SentWordNet lexical resource

Author: Ohana Bruno
Publication venue: Dublin Institute of Technology
Publication date: 01/03/2009
Field of study

Sentiment classification concerns the application of automatic methods for predicting the orientation of sentiment present on text documents. It is an important subject in opinion mining research, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is a lexical resource of sentiment information for terms in the English language designed to assist in opinion mining tasks, where each term is associated with numerical scores for positive and negative sentiment information. A resource that makes term level sentiment information readily available could be of use in building more effective sentiment classification methods. This research presents the results of an experiment that applied the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. First, a data set of relevant features extracted from text documents using SentiWordNet was designed and implemented. The resulting feature set is then used as input for training a support vector machine classifier for predicting the sentiment orientation of the underlying film review. Several scenarios exploring variations on the parameters that generate the data set, outlier removal and feature selection were executed. The results obtained are compared to other methods documented in the literature. It was found that they are in line with other experiments that propose similar approaches and use the same data set of film reviews, indicating SentiWordNet could become an important resource for the task of sentiment classification. Considerations on future improvements are also presented based on a detailed analysis of classification results

Arrow@TUDublin

Evaluation bias in effort estimation

Author: Jalali Omid
Publication venue: The Research Repository @ WVU
Publication date: 01/05/2008
Field of study

There exists a large number of software effort estimation methods in the literature and the space of possibilities [54] is yet to be fully explored. There is little conclusive evidence about the relative performance of such methods and many studies suffer from instability in their conclusions. As a result, the effort estimation literature lacks a stable ranking of such methods.;This research aims at providing a stable ranking of a large number of methods using data sets based on COCOMO features. For this task, the COSEEKMO tool [46] was further developed into a benchmarking tool and several well-known effort estimation methods, including model trees, linear regression methods, local calibration, and several newly developed methods were used in COSEEKMO for a thorough comparison. The problem of instability was further explored and the evaluation method used was identified as the cause of instability. Therefore, the existing evaluation bias was corrected through a new evaluation approach, which was non-parametric. The Mann-Whitney U test [42] is the non-parametric test used in this study, which introduced a great amount of stability in the results. Several evaluation criteria were tested in order to analyze their possible effects on the observed stability.;The conclusions made in this study were stable across different evaluation criteria, different data sets, and different random runs. As a result, a group of four methods were selected as the best effort estimation methods among the explored 312 combinations of methods. These four methods were all based on the local calibration procedure proposed by Boehm [4]. Furthermore, these methods were simpler and more effective than many other complex methods including the Wrapper [37] and model trees [60], which are well-known methods in the literature.;Therefore, while there exists no single universal best method for effort estimation, this study suggests applying the four methods reported here to the historical data and using the best performing method among these four to estimate the effort for future projects. In addition, this study provides a path for comparing other existing or new effort estimation methods with the currently explored methods. This path involves a systematic comparison of the performance of each method against all other methods, including the methods studied in this work, through a benchmarking tool such as COSEEKMO, and using the non-parametric Mann-Whitney U test

The Research Repository @ WVU (West Virginia University)