5 research outputs found
Efficacy of Reported Issue Times as a Means for Effort Estimation
Software effort is a measure of manpower dedicated to developing and maintaining and software. Effort estimation can help project managers monitor their software, teams, and timelines. Conversely, improper effort estimation can result in budget overruns, delays, lost contracts, and accumulated Technical Debt (TD). Issue Tracking Systems (ITS) have become mainstream project management tools, with over 65,000 companies using Jira alone. ITS are an untapped resource for issue resolution effort research. Related work investigates issue effort for specific issue types, usually Bugs or similar. They model their developer-documented issue resolution times using features from the issues themselves. This thesis explores a novel issue effort estimation and prediction approach using developer-documented ITS effort in tandem with implementation metrics (commit metrics, package metrics, refactoring metrics, and smell metrics). We find consistent correlations between ITS effort and implementation metrics, ranging from weak to moderate strength. We also construct and evaluate several exploratory models to predict future package effort using our novel effort estimation, with inconclusive results
Analyzing and Predicting Effort Associated with Finding and Fixing Software Faults
Context: Software developers spend a significant amount of time fixing faults. However, not many papers have addressed the actual effort needed to fix software faults. Objective: The objective of this paper is twofold: (1) analysis of the effort needed to fix software faults and how it was affected by several factors and (2) prediction of the level of fix implementation effort based on the information provided in software change requests. Method: The work is based on data related to 1200 failures, extracted from the change tracking system of a large NASA mission. The analysis includes descriptive and inferential statistics. Predictions are made using three supervised machine learning algorithms and three sampling techniques aimed at addressing the imbalanced data problem. Results: Our results show that (1) 83% of the total fix implementation effort was associated with only 20% of failures. (2) Both safety critical failures and post-release failures required three times more effort to fix compared to non-critical and pre-release counterparts, respectively. (3) Failures with fixes spread across multiple components or across multiple types of software artifacts required more effort. The spread across artifacts was more costly than spread across components. (4) Surprisingly, some types of faults associated with later life-cycle activities did not require significant effort. (5) The level of fix implementation effort was predicted with 73% overall accuracy using the original, imbalanced data. Using oversampling techniques improved the overall accuracy up to 77%. More importantly, oversampling significantly improved the prediction of the high level effort, from 31% to around 85%. Conclusions: This paper shows the importance of tying software failures to changes made to fix all associated faults, in one or more software components and/or in one or more software artifacts, and the benefit of studying how the spread of faults and other factors affect the fix implementation effort
Identifying Common Patterns and Unusual Dependencies in Faults, Failures and Fixes for Large-scale Safety-critical Software
As software evolves, becoming a more integral part of complex systems, modern society becomes more reliant on the proper functioning of such systems. However, the field of software quality assurance lacks detailed empirical studies from which best practices can be determined. The fundamental factors that contribute to software quality are faults, failures and fixes, and although some studies have considered specific aspects of each, comprehensive studies have been quite rare. Thus, the fact that we establish the cause-effect relationship between the fault(s) that caused individual failures, as well as the link to the fixes made to prevent the failures from (re)occurring appears to be a unique characteristic of our work. In particular, we analyze fault types, verification activities, severity levels, investigation effort, artifacts fixed, components fixed, and the effort required to implement fixes for a large industrial case study. The analysis includes descriptive statistics, statistical inference through formal hypothesis testing, and data mining. Some of the most interesting empirical results include (1) Contrary to popular belief, later life-cycle faults dominate as causes of failures. Furthermore, over 50% of high priority failures (e.g., post-release failures and safety-critical failures) were caused by coding faults. (2) 15% of failures led to fixes spread across multiple components and the spread was largely affected by the software architecture. (3) The amount of effort spent fixing faults associated with each failure was not uniformly distributed across failures; fixes with a greater spread across components and artifacts, required more effort. Overall, the work indicates that fault prevention and elimination efforts focused on later life cycle faults is essential as coding faults were the dominating cause of safety-critical failures and post-release failures. Further, statistical correlation and/or traditional data mining techniques show potential for assessment and prediction of the locations of fixes and the associated effort. By providing quantitative results and including statistical hypothesis testing, which is not yet a standard practice in software engineering, our work enriches the empirical knowledge needed to improve the state-of-the-art and practice in software quality assurance
Empirical analysis of software reliability
This thesis presents an empirical study of architecture-based software reliability based on large real case studies. It undoubtedly demonstrates the value of using open source software to empirically study software reliability. The major goal is to empirically analyze the applicability, adequacy and accuracy of architecture-based software reliability models. In both our studies we found evidence that the number of failures due to faults in more than one component is not insignificant. Consequently, existing models that make such simplifying assumptions must be improved to account for this phenomenon. This thesis\u27 contributions include developing automatic methods for efficient extraction of necessary data from the available repositories, and using this data to test how and when architecture-based software reliability models work. We study their limitations and ways to improve them. Our results show the importance of knowledge gained from the interaction between theoretical and empirical research
An Effort Prediction Framework for Software Defect Correction
Developers apply changes and updates to software systems to adapt to emerging
environments and address new requirements. In turn, these changes introduce
additional software defects, usually caused by our inability to comprehend the full
scope of the modi ed code. As a result, software practitioners have developed tools
to aid in the detection and prediction of imminent software defects, in addition to
the eort required to correct them. Although software development eort prediction
has been in use for many years, research into defect-correction eort prediction is
relatively new. The increasing complexity, integration and ubiquitous nature of
current software systems has sparked renewed interest in this eld. Eort prediction
now plays a critical role in the planning activities of managers. Accurate predictions
help corporations budget, plan and distribute available resources eectively and
e ciently. In particular, early defect-correction eort predictions could be used by
testers to set schedules, and by managers to plan costs and provide earlier feedback
to customers about future releases.
In this work, we address the problem of predicting the eort needed to resolve a
software defect. More speci cally, our study is concerned with defects or issues that
are reported on an Issue Tracking System or any other defect repository. Current
approaches use one prediction method or technique to produce eort predictions.
This approach usually suers from the weaknesses of the chosen prediction method,
and consequently the accuracy of the predictions are aected. To address this problem,
we present a composite prediction framework. Rather than using one prediction
approach for all defects, we propose the use of multiple integrated methods
which complement the weaknesses of one another. Our framework is divided into
two sub-categories, Similarity-Score Dependent and Similarity-Score Independent.
The Similarity-Score Dependent method utilizes the power of Case-Based Reasoning,
also known as Instance-Based Reasoning, to compute predictions. It relies on
matching target issues to similar historical cases, then combines their known eort
for an informed estimate. On the other hand, the Similarity-Score Independent
method makes use of other defect-related information with some statistical manipulation
to produce the required estimate. To measure similarity between defects,
some method of distance calculation must be used. In some cases, this method
might produce misleading results due to observed inconsistencies in history, and
the fact that current similarity-scoring techniques cannot account for all the variability
in the data. In this case, the Similarity-Score Independent method can be
used to estimate the eort, where the eect of such inconsistencies can be reduced.
We have performed a number of experimental studies on the proposed framework
to assess the eectiveness of the presented techniques. We extracted the data sets
from an operational Issue Tracking System in order to test the validity of the model
on real project data. These studies involved the development of multiple tools in
both the Java programming language and PHP, each for a certain stage of data
analysis and manipulation. The results show that our proposed approach produces
signi cant improvements when compared to current methods