42 research outputs found
EnHMM: On the Use of Ensemble HMMs and Stack Traces to Predict the Reassignment of Bug Report Fields
Bug reports (BR) contain vital information that can help triaging teams
prioritize and assign bugs to developers who will provide the fixes. However,
studies have shown that BR fields often contain incorrect information that need
to be reassigned, which delays the bug fixing process. There exist approaches
for predicting whether a BR field should be reassigned or not. These studies
use mainly BR descriptions and traditional machine learning algorithms (SVM,
KNN, etc.). As such, they do not fully benefit from the sequential order of
information in BR data, such as function call sequences in BR stack traces,
which may be valuable for improving the prediction accuracy. In this paper, we
propose a novel approach, called EnHMM, for predicting the reassignment of BR
fields using ensemble Hidden Markov Models (HMMs), trained on stack traces.
EnHMM leverages the natural ability of HMMs to represent sequential data to
model the temporal order of function calls in BR stack traces. When applied to
Eclipse and Gnome BR repositories, EnHMM achieves an average precision, recall,
and F-measure of 54%, 76%, and 60% on Eclipse dataset and 41%, 69%, and 51% on
Gnome dataset. We also found that EnHMM improves over the best single HMM by
36% for Eclipse and 76% for Gnome. Finally, when comparing EnHMM to Im.ML.KNN,
a recent approach in the field, we found that the average F-measure score of
EnHMM improves the average F-measure of Im.ML.KNN by 6.80% and improves the
average recall of Im.ML.KNN by 36.09%. However, the average precision of EnHMM
is lower than that of Im.ML.KNN (53.93% as opposed to 56.71%).Comment: Published in Proceedings of the 28th IEEE International Conference on
Software Analysis, Evolution and Reengineering (SANER 2021), 11 pages, 7
figure
An Industrial Study on Predicting Crash Report Log Types Using Large Language Models
Software crashes and failures take a fair amount of effort and time to resolve. Software developers
use information submitted in crash reports (CRs) to conduct root cause analysis of faults. The
problem is that CRs often lack all the information required. Automatic prediction of CR fields can
therefore reduce the crash resolution process time. In this thesis, we use CR headings and
descriptions to predict the type of log files that should be attached to a CR. Our approach is to use
multilabel learning algorithms to train a machine learning model using a dataset from Ericsson’s
CR database to predict the type of log files based on CR headings and descriptions. We use three
different pre-trained language models Bert, Telecom Bert, and Word2Vector to extract feature
vectors from CR headings and descriptions and then feed these vectors to three different multilabel
learning algorithms, namely Binary Relevance (BR), Classifier Chain (CC), and Neural Network
(NN). Then, we compare the performance of different feature sets. We found that the use of
headings alone with pre-trained language models Bert and Telecom Bert results in the best average
AUC (0.70). The use of descriptions and headings and descriptions together as features resulted in
an average AUC varying from 0.65 to 0.70. In general, the algorithms showed no significant
difference in their performances, but the choice of features impacts the performance. Also, the
performance of predicting each type of log is influenced by the use of keywords in headings and
descriptions that describe these files. We found that log types with a clear definition such as Key
Performance Indicators (KPI) Logs, Post-mortem Dumps (PMD), and execution traces can be
predicted with higher accuracy
Automatic bug triaging techniques using machine learning and stack traces
When a software system crashes, users have the option to report the crash using automated bug tracking systems. These tools capture software crash and failure data (e.g., stack traces, memory dumps, etc.) from end-users. These data are sent in the form of bug (crash) reports to the software development teams to uncover the causes of the crash and provide adequate fixes. The reports are first assessed (usually in a semi-automatic way) by a group of software analysts, known as triagers. Triagers assign priority to the bugs and redirect them to the software development teams in order to provide fixes.
The triaging process, however, is usually very challenging. The problem is that many of these reports are caused by similar faults. Studies have shown that one way to improve the bug triaging process is to detect automatically duplicate (or similar) reports. This way, triagers would not need to spend time on reports caused by faults that have already been handled. Another issue is related to the prioritization of bug reports. Triagers often rely on the information provided by the customers (the report submitters) to prioritize bug reports. However, this task can be quite tedious and requires tool support. Next, triagers route the bug report to the responsible development team based on the subsystem, which caused the crash. Since having knowledge of all the subsystems of an ever-evolving industrial system is impractical, having a tool to automatically identify defective subsystems can significantly reduce the manual bug triaging effort.
The main goal of this research is to investigate techniques and tools to help triagers process bug reports. We start by studying the effect of the presence of stack traces in analyzing bug reports. Next, we present a framework to help triagers in each step of the bug triaging process. We propose a new and scalable method to automatically detect duplicate bug reports using stack traces and bug report categorical features. We then propose a novel approach for predicting bug severity using stack traces and categorical features, and finally, we discuss a new method for predicting faulty product and component fields of bug reports.
We evaluate the effectiveness of our techniques using bug reports from two large open-source systems. Our results show that stack traces and machine learning methods can be used to automate the bug triaging process, and hence increase the productivity of bug triagers, while reducing costs and efforts associated with manual triaging of bug reports
On the Use of Software Tracing and Boolean Combination of Ensemble Classifiers to Support Software Reliability and Security Tasks
In this thesis, we propose an approach that relies on Boolean combination of multiple one-class classification methods based on Hidden Markov Models (HMMs), which are pruned using weighted Kappa coefficient to select and combine accurate and diverse classifiers. Our approach, called WPIBC (Weighted Pruning Iterative Boolean Combination) works in three phases. The first phase selects a subset of the available base diverse soft classifiers by pruning all the redundant soft classifiers based on a weighted version of Cohen’s kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp classifiers from the base soft classifiers (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp classifiers are then combined in the final phase using Boolean combinations. We apply the proposed approach to two important problems in software security and reliability: The detection of system anomalies and the prediction of the reassignment of bug report fields.
Detecting system anomalies at run-time is a critical component of system reliability and security. Studies in this area focus mainly on the effectiveness of the proposed approaches -the ability to detect anomalies with high accuracy. Less attention was given to false alarm and efficiency. Although ensemble approaches for the detection of anomalies that use Boolean combination of classifier decisions have been shown to be useful in reducing the false alarm rate over that of a single classifier, existing methods rely on an exponential number of combinations making them impractical even for a small number of classifiers. Our approach is not only able to maintain and even improve the accuracy of existing Boolean combination techniques, but also significantly reduce the combination time and the number of classifiers selected for combination.
The second application domain of our approach is the prediction of the reassignment of bug report fields. Bug reports contain a wealth of information that is used by triaging and development teams to understand the causes of bugs in order to provide fixes. The problem is that, for various reasons, it is common to have bug reports with missing or incorrect information, hindering the bug resolution process. To address this problem. researchers have turned to machine learning techniques. The common practice is to build models that leverage historical bug reports to automatically predict when a given bug report field should be reassigned. Existing approaches have mainly relied upon classifiers that make use of natural language in the title and description of the bug reports. They fail to take advantage of the richly detailed sequential information that is present in stack traces included in bug reports. To address this, we propose an approach called EnHMM which uses WPIBC and stack traces to predict the reassignment of bug report fields.
Another contribution of this thesis is an approach to improve the efficiency of WPIBC by leveraging the Hadoop framework and the MapReduce programming model. We also show how WPIBC can be extended to support heterogenous classifiers
Recommended from our members
Effective bug detection and localization using information retrieval
Software bugs pose a fundamental threat to the reliability of software systems, even in systems designed with the best software engineering (SE) teams using the best SE practices. Detecting bugs early and fixing them quickly are extremely important. However, they are very expensive and challenging, especially at-scale. While the sciences of bug detection (e.g., software testing) and localization via static and dynamic program analyses have been explored considerably, text-based Information Retrieval (IR) techniques for bug detection and localization are interesting and promising new approaches for these problems. One advantage of text-based approaches is that it can utilize a lot of (implicit) semantic information about a program’s functionality from the program text, which is almost impossible to extract using program analysis based techniques. This dissertation builds a deeper understanding of current bug triaging and fixing processes via mining software repositories, and introduces new techniques for effective bug detection and localization. The dissertation has three main parts. First, we perform a number of empirical studies to investigate the extent of and reasons for long lived bugs, their severities, and time spent in different phases of bug fixing process. We demonstrate that many bugs remain unfixed for inordinate period of time due to numerous reasons, including difficulties in detecting, localizing, and fixing them. Second, we demonstrate that developers use very similar program text in source code and their corresponding test cases, which could be utilized to implement powerful test prioritization techniques. We introduce a novel IR based regression test prioritization technique called REPiR that embodies our insight, and show that REPiR is more efficient than program analysis based or dynamic coverage based techniques. Third, we demonstrate that fine grained program text such as class names, method names, variable names, and comments carry different levels of information, and it can be utilized to improve IR based bug localization. We introduce a structured retrieval technique called BLUiR that embodies our insights and show that BLUiR outperforms the existing state-of-the-art IR-based bug localization approaches. Finally, we further improve BLUiR by natural language processing. We make four contributions in this dissertation. One, we provide empirical evidence that there are considerable numbers of non-trivial bugs in software projects that survive for a long time. We describe the reasons for delay in fixing, the nature of fixes, and overall fixing process of these long lived bugs in a great detail. Two, we introduce the notion of IR-based regression test prioritization based on program changes. Three, we introduce the notion of structured retrieval for bug localization. Four, we provide an in-depth analysis of the extent to which natural languages processing can play an important role in improving IR-based bug localization further. The central ideas are embodied in a suite of prototype tools. Rigorous empirical evaluation is performed to validate the efficacy of the proposed techniques using datasets containing a variety of real-world Java and C programs.Electrical and Computer Engineerin
Adopting Automated Bug Assignment in Practice: A Longitudinal Case Study at Ericsson
The continuous inflow of bug reports is a considerable challenge in large
development projects. Inspired by contemporary work on mining software
repositories, we designed a prototype bug assignment solution based on machine
learning in 2011-2016. The prototype evolved into an internal Ericsson product,
TRR, in 2017-2018. TRR's first bug assignment without human intervention
happened in April 2019. Our study evaluates the adoption of TRR within its
industrial context at Ericsson. Moreover, we investigate 1) how TRR performs in
the field, 2) what value TRR provides to Ericsson, and 3) how TRR has
influenced the ways of working. We conduct an industrial case study combining
interviews with TRR stakeholders, minutes from sprint planning meetings, and
bug tracking data. The data analysis includes thematic analysis, descriptive
statistics, and Bayesian causal analysis. TRR is now an incorporated part of
the bug assignment process. Considering the abstraction levels of the
telecommunications stack, high-level modules are more positive while low-level
modules experienced some drawbacks. On average, TRR automatically assigns 30%
of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are
resolved around 21% faster within Ericsson, and TRR has saved highly seasoned
engineers many hours of work. Indirect effects of adopting TRR include process
improvements, process awareness, increased communication, and higher job
satisfaction. TRR has saved time at Ericsson, but the adoption of automated bug
assignment was more intricate compared to similar endeavors reported from other
companies. We primarily attribute the difference to the very large size of the
organization and the complex products. Key facilitators in the successful
adoption include a gradual introduction, product champions, and careful
stakeholder analysis.Comment: Under revie
Using Screenshot Attachments in Issue Reports for Triaging
In previous work, we deployed IssueTAG, which uses the texts present in the
one-line summary and the description fields of the issue reports to
automatically assign them to the stakeholders, who are responsible for
resolving the reported issues. Since its deployment on January 12, 2018 at
Softtech, i.e., the software subsidiary of the largest private bank in Turkey,
IssueTAG has made a total of 301,752 assignments (as of November 2021). One
observation we make is that a large fraction of the issue reports submitted to
Softtech has screenshot attachments and, in the presence of such attachments,
the reports often convey less information in their one-line summary and the
description fields, which tends to reduce the assignment accuracy. In this
work, we use the screenshot attachments as an additional source of information
to further improve the assignment accuracy, which (to the best of our
knowledge) has not been studied before in this context. In particular, we
develop a number of multi-source (using both the issue reports and the
screenshot attachments) and single-source assignment models (using either the
issue reports or the screenshot attachments) and empirically evaluate them on
real issue reports. In the experiments, compared to the currently deployed
single-source model in the field, the best multi-source model developed in this
work, significantly (both in the practical and statistical sense) improved the
assignment accuracy for the issue reports with screenshot attachments from
0.843 to 0.858 at acceptable overhead costs, a result strongly supporting our
basic hypothesis.Comment: Preprint for EMSE journa
A Longitudinal Analysis of Bug Handling Across Eclipse Releases
International audienceLarge open source software projects, like Eclipse, follow a continuous software development process, with a regular release cycle. During each release, new bugs are reported, triaged and resolved. Previous studies have focused on various aspects of bug fixing, such as bug triaging, bug prediction, and bug process analysis. Most studies, however, do not distinguish between what happens before and after each scheduled release. We are also unaware of studies that compare bug fixing activities across different project releases. This paper presents an empirical analysis of the bug handling process of Eclipse over a 15-year period, considering 138K bug reports from Bugzilla, including 16 annual Eclipse releases and two quarterly releases in 2018. We compare the bug resolution rate, the fixing rate, the bug triaging time and the fixing time before and after each release date, and we study the possible impact of "release pressure". Among others, our results reveal that Eclipse bug handling activity is improving over time, with an important decrease in the number of reported bugs before releases, an increase in the bug fixing rate and an increasingly balanced bug handling workload before and after releases. The recent transition from an annual to a quarterly release cycle continued to improve the bug handling process
Determinants of Success of the Open Source Selective Revealing Strategy: Solution Knowledge Emergence
Recent research suggests that firms may be able to create a competitive advantage by deliberately revealing specific problem knowledge beyond firm boundaries to open source meta-organisations such that new solution knowledge is created that benefits the focal firm more than its competitors (Alexy, George, & Salter, 2013). Yet, not all firms that use knowledge revealing strategies are successful in inducing the emergence of solution knowledge. The extant literature has as of yet not explained this heterogeneity in success of knowledge revealing strategies. Using a longitudinal database spanning the period from 1998 to end 2012 with more than 2 billion data points that was obtained from the Mozilla Foundation, one of the top open source meta-organisations, this dissertation identifies and measures the antecedent factors affecting successful solution knowledge emergence. The results reveal 35 antecedent factors that affect solution knowledge emergence in different ways across three levels of analysis. The numerous contributions to theory and practice that follow from the results are discussed