139 research outputs found
Evidence-based defect assessment and prediction for software product lines
The systematic reuse provided by software product lines provides
opportunities to achieve increased quality and reliability as a product line
matures. This has led to a widely accepted assumption that as a product line evolves, its reliability improves. However, evidence in terms of empirical investigation of the relationship among change, reuse and reliability in evolving software product lines is lacking.
To address the problem this work investigates: 1) whether reliability as measured by post-deployment failures improves as the products and components in a software product line change over time, and 2)
whether the stabilizing effect of shared artifacts enables accurate prediction of failure-prone files in the product line.
The first part of this work performs defect assessment and investigates defect trends in Eclipse, an open-source software product line. It analyzes the evolution of the product line over time in terms of the total number of defects, the percentage of severe defects and the relationship between defects and changes.
The second part of this work explores prediction of failure-prone files in the Eclipse product line to determine whether prediction improves as the
product line evolves over time. In addition, this part investigates the effect
of defect and data collection periods on the prediction performance.
The main contributions of this work include findings that the majority of files with severe defects are reused files rather than new files, but that
common components experience less change than variation components. The work also found that there is a consistent set of metrics which serve as prominent predictors across multiple products and reuse categories over time. Classification of post-release, failure-prone files using change data for the Eclipse product line gives better recall and false positive rates as compared to classification using static code metrics. The work also found that on-going change in product lines hinders the ability to predict failure-prone files, and that predicting post-release defects using pre-release change data for the Eclipse case study is difficult. For example, using more data from the past to predict future failure-prone files does not necessarily give better results than
using data only from the recent past. The empirical investigation of product line change and defect data leads to an improved understanding of the interplay among change, reuse and reliability as a product line evolves
Automatic bug triaging techniques using machine learning and stack traces
When a software system crashes, users have the option to report the crash using automated bug tracking systems. These tools capture software crash and failure data (e.g., stack traces, memory dumps, etc.) from end-users. These data are sent in the form of bug (crash) reports to the software development teams to uncover the causes of the crash and provide adequate fixes. The reports are first assessed (usually in a semi-automatic way) by a group of software analysts, known as triagers. Triagers assign priority to the bugs and redirect them to the software development teams in order to provide fixes.
The triaging process, however, is usually very challenging. The problem is that many of these reports are caused by similar faults. Studies have shown that one way to improve the bug triaging process is to detect automatically duplicate (or similar) reports. This way, triagers would not need to spend time on reports caused by faults that have already been handled. Another issue is related to the prioritization of bug reports. Triagers often rely on the information provided by the customers (the report submitters) to prioritize bug reports. However, this task can be quite tedious and requires tool support. Next, triagers route the bug report to the responsible development team based on the subsystem, which caused the crash. Since having knowledge of all the subsystems of an ever-evolving industrial system is impractical, having a tool to automatically identify defective subsystems can significantly reduce the manual bug triaging effort.
The main goal of this research is to investigate techniques and tools to help triagers process bug reports. We start by studying the effect of the presence of stack traces in analyzing bug reports. Next, we present a framework to help triagers in each step of the bug triaging process. We propose a new and scalable method to automatically detect duplicate bug reports using stack traces and bug report categorical features. We then propose a novel approach for predicting bug severity using stack traces and categorical features, and finally, we discuss a new method for predicting faulty product and component fields of bug reports.
We evaluate the effectiveness of our techniques using bug reports from two large open-source systems. Our results show that stack traces and machine learning methods can be used to automate the bug triaging process, and hence increase the productivity of bug triagers, while reducing costs and efforts associated with manual triaging of bug reports
Analyzing the Influence of Processor Speed and Clock Speed on Remaining Useful Life Estimation of Software Systems
Prognostics and Health Management (PHM) is a discipline focused on predicting
the point at which systems or components will cease to perform as intended,
typically measured as Remaining Useful Life (RUL). RUL serves as a vital
decision-making tool for contingency planning, guiding the timing and nature of
system maintenance. Historically, PHM has primarily been applied to hardware
systems, with its application to software only recently explored. In a recent
study we introduced a methodology and demonstrated how changes in software can
impact the RUL of software. However, in practical software development,
real-time performance is also influenced by various environmental attributes,
including operating systems, clock speed, processor performance, RAM, machine
core count and others. This research extends the analysis to assess how changes
in environmental attributes, such as operating system and clock speed, affect
RUL estimation in software. Findings are rigorously validated using real
performance data from controlled test beds and compared with predictive
model-generated data. Statistical validation, including regression analysis,
supports the credibility of the results. The controlled test bed environment
replicates and validates faults from real applications, ensuring a standardized
assessment platform. This exploration yields actionable knowledge for software
maintenance and optimization strategies, addressing a significant gap in the
field of software health management
Software Reliability models for the first stage of Software Projects
A software reliability analysis for the first stage of software projects is presented. At this very first stage of testing we expect an increasing failure rate, where the usual software reliability growth models based on non homogeneous Poisson processes like the Goel-Okumoto or Musa-Okumoto can not be applied. However, our analysis involves some models that combine reliability growth with increasing failure rates like the logistic and delayed S-shaped models. Our analysis also includes a new model based on contagion as in the increasing failure rate as in the reliability growth stages. We point out that increasing failure rate stages are important to be modeled since corrective actions can be taken soon and also that this characteristics highlights under modern development methodologies which development is performed simultaneously as testing, like in Agile and TDD (Test driven development). Results of the application of those models to real datasets is shown.Sociedad Argentina de Informática e Investigación Operativ
Software Reliability models for the first stage of Software Projects
A software reliability analysis for the first stage of software projects is presented. At this very first stage of testing we expect an increasing failure rate, where the usual software reliability growth models based on non homogeneous Poisson processes like the Goel-Okumoto or Musa-Okumoto can not be applied. However, our analysis involves some models that combine reliability growth with increasing failure rates like the logistic and delayed S-shaped models. Our analysis also includes a new model based on contagion as in the increasing failure rate as in the reliability growth stages. We point out that increasing failure rate stages are important to be modeled since corrective actions can be taken soon and also that this characteristics highlights under modern development methodologies which development is performed simultaneously as testing, like in Agile and TDD (Test driven development). Results of the application of those models to real datasets is shown.Sociedad Argentina de Informática e Investigación Operativ
Data Mining and Machine Learning for Software Engineering
Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques
SZZ in the time of Pull Requests
In the multi-commit development model, programmers complete tasks (e.g.,
implementing a feature) by organizing their work in several commits and
packaging them into a commit-set. Analyzing data from developers using this
model can be useful to tackle challenging developers' needs, such as knowing
which features introduce a bug as well as assessing the risk of integrating
certain features in a release. However, to do so one first needs to identify
fix-inducing commit-sets. For such an identification, the SZZ algorithm is the
most natural candidate, but its performance has not been evaluated in the
multi-commit context yet. In this study, we conduct an in-depth investigation
on the reliability and performance of SZZ in the multi-commit model. To obtain
a reliable ground truth, we consider an already existing SZZ dataset and adapt
it to the multi-commit context. Moreover, we devise a second dataset that is
more extensive and directly created by developers as well as Quality Assurance
(QA) engineers of Mozilla. Based on these datasets, we (1) test the performance
of B-SZZ and its non-language-specific SZZ variations in the context of the
multi-commit model, (2) investigate the reasons behind their specific behavior,
and (3) analyze the impact of non-relevant commits in a commit-set and
automatically detect them before using SZZ
Learning to classify software defects from crowds: a novel approach
In software engineering, associating each reported defect with a cate- gory allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using stan- dard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always available. To cir- cumvent this dependency, we propose to apply the learning from crowds paradigm, where training categories are obtained from multiple non-expert annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class information, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of the IBM’s or- thogonal defect classification working on the issue tracking systems from two different real domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects. The considered methodologies show enhanced performance regarding the straightforward solution (majority voting) according to different metrics. This shows the possibilities of using non-expert knowledge aggregation techniques when expert knowledge is unavailable
- …