    EnHMM: On the Use of Ensemble HMMs and Stack Traces to Predict the Reassignment of Bug Report Fields

    Bug reports (BR) contain vital information that can help triaging teams prioritize and assign bugs to developers who will provide the fixes. However, studies have shown that BR fields often contain incorrect information that need to be reassigned, which delays the bug fixing process. There exist approaches for predicting whether a BR field should be reassigned or not. These studies use mainly BR descriptions and traditional machine learning algorithms (SVM, KNN, etc.). As such, they do not fully benefit from the sequential order of information in BR data, such as function call sequences in BR stack traces, which may be valuable for improving the prediction accuracy. In this paper, we propose a novel approach, called EnHMM, for predicting the reassignment of BR fields using ensemble Hidden Markov Models (HMMs), trained on stack traces. EnHMM leverages the natural ability of HMMs to represent sequential data to model the temporal order of function calls in BR stack traces. When applied to Eclipse and Gnome BR repositories, EnHMM achieves an average precision, recall, and F-measure of 54%, 76%, and 60% on Eclipse dataset and 41%, 69%, and 51% on Gnome dataset. We also found that EnHMM improves over the best single HMM by 36% for Eclipse and 76% for Gnome. Finally, when comparing EnHMM to Im.ML.KNN, a recent approach in the field, we found that the average F-measure score of EnHMM improves the average F-measure of Im.ML.KNN by 6.80% and improves the average recall of Im.ML.KNN by 36.09%. However, the average precision of EnHMM is lower than that of Im.ML.KNN (53.93% as opposed to 56.71%).Comment: Published in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021), 11 pages, 7 figure

    An Empirical Study of Bug Report Field Reassignment

    An Industrial Study on Predicting Crash Report Log Types Using Large Language Models

    Software crashes and failures take a fair amount of effort and time to resolve. Software developers use information submitted in crash reports (CRs) to conduct root cause analysis of faults. The problem is that CRs often lack all the information required. Automatic prediction of CR fields can therefore reduce the crash resolution process time. In this thesis, we use CR headings and descriptions to predict the type of log files that should be attached to a CR. Our approach is to use multilabel learning algorithms to train a machine learning model using a dataset from Ericsson’s CR database to predict the type of log files based on CR headings and descriptions. We use three different pre-trained language models Bert, Telecom Bert, and Word2Vector to extract feature vectors from CR headings and descriptions and then feed these vectors to three different multilabel learning algorithms, namely Binary Relevance (BR), Classifier Chain (CC), and Neural Network (NN). Then, we compare the performance of different feature sets. We found that the use of headings alone with pre-trained language models Bert and Telecom Bert results in the best average AUC (0.70). The use of descriptions and headings and descriptions together as features resulted in an average AUC varying from 0.65 to 0.70. In general, the algorithms showed no significant difference in their performances, but the choice of features impacts the performance. Also, the performance of predicting each type of log is influenced by the use of keywords in headings and descriptions that describe these files. We found that log types with a clear definition such as Key Performance Indicators (KPI) Logs, Post-mortem Dumps (PMD), and execution traces can be predicted with higher accuracy

    Automatic bug triaging techniques using machine learning and stack traces

    When a software system crashes, users have the option to report the crash using automated bug tracking systems. These tools capture software crash and failure data (e.g., stack traces, memory dumps, etc.) from end-users. These data are sent in the form of bug (crash) reports to the software development teams to uncover the causes of the crash and provide adequate fixes. The reports are first assessed (usually in a semi-automatic way) by a group of software analysts, known as triagers. Triagers assign priority to the bugs and redirect them to the software development teams in order to provide fixes. The triaging process, however, is usually very challenging. The problem is that many of these reports are caused by similar faults. Studies have shown that one way to improve the bug triaging process is to detect automatically duplicate (or similar) reports. This way, triagers would not need to spend time on reports caused by faults that have already been handled. Another issue is related to the prioritization of bug reports. Triagers often rely on the information provided by the customers (the report submitters) to prioritize bug reports. However, this task can be quite tedious and requires tool support. Next, triagers route the bug report to the responsible development team based on the subsystem, which caused the crash. Since having knowledge of all the subsystems of an ever-evolving industrial system is impractical, having a tool to automatically identify defective subsystems can significantly reduce the manual bug triaging effort. The main goal of this research is to investigate techniques and tools to help triagers process bug reports. We start by studying the effect of the presence of stack traces in analyzing bug reports. Next, we present a framework to help triagers in each step of the bug triaging process. We propose a new and scalable method to automatically detect duplicate bug reports using stack traces and bug report categorical features. We then propose a novel approach for predicting bug severity using stack traces and categorical features, and finally, we discuss a new method for predicting faulty product and component fields of bug reports. We evaluate the effectiveness of our techniques using bug reports from two large open-source systems. Our results show that stack traces and machine learning methods can be used to automate the bug triaging process, and hence increase the productivity of bug triagers, while reducing costs and efforts associated with manual triaging of bug reports

    On the Use of Software Tracing and Boolean Combination of Ensemble Classifiers to Support Software Reliability and Security Tasks

    In this thesis, we propose an approach that relies on Boolean combination of multiple one-class classification methods based on Hidden Markov Models (HMMs), which are pruned using weighted Kappa coefficient to select and combine accurate and diverse classifiers. Our approach, called WPIBC (Weighted Pruning Iterative Boolean Combination) works in three phases. The first phase selects a subset of the available base diverse soft classifiers by pruning all the redundant soft classifiers based on a weighted version of Cohen’s kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp classifiers from the base soft classifiers (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp classifiers are then combined in the final phase using Boolean combinations. We apply the proposed approach to two important problems in software security and reliability: The detection of system anomalies and the prediction of the reassignment of bug report fields. Detecting system anomalies at run-time is a critical component of system reliability and security. Studies in this area focus mainly on the effectiveness of the proposed approaches -the ability to detect anomalies with high accuracy. Less attention was given to false alarm and efficiency. Although ensemble approaches for the detection of anomalies that use Boolean combination of classifier decisions have been shown to be useful in reducing the false alarm rate over that of a single classifier, existing methods rely on an exponential number of combinations making them impractical even for a small number of classifiers. Our approach is not only able to maintain and even improve the accuracy of existing Boolean combination techniques, but also significantly reduce the combination time and the number of classifiers selected for combination. The second application domain of our approach is the prediction of the reassignment of bug report fields. Bug reports contain a wealth of information that is used by triaging and development teams to understand the causes of bugs in order to provide fixes. The problem is that, for various reasons, it is common to have bug reports with missing or incorrect information, hindering the bug resolution process. To address this problem. researchers have turned to machine learning techniques. The common practice is to build models that leverage historical bug reports to automatically predict when a given bug report field should be reassigned. Existing approaches have mainly relied upon classifiers that make use of natural language in the title and description of the bug reports. They fail to take advantage of the richly detailed sequential information that is present in stack traces included in bug reports. To address this, we propose an approach called EnHMM which uses WPIBC and stack traces to predict the reassignment of bug report fields. Another contribution of this thesis is an approach to improve the efficiency of WPIBC by leveraging the Hadoop framework and the MapReduce programming model. We also show how WPIBC can be extended to support heterogenous classifiers

    Adopting Automated Bug Assignment in Practice: A Longitudinal Case Study at Ericsson

    The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention happened in April 2019. Our study evaluates the adoption of TRR within its industrial context at Ericsson. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. We conduct an industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.Comment: Under revie

    Using Screenshot Attachments in Issue Reports for Triaging

    In previous work, we deployed IssueTAG, which uses the texts present in the one-line summary and the description fields of the issue reports to automatically assign them to the stakeholders, who are responsible for resolving the reported issues. Since its deployment on January 12, 2018 at Softtech, i.e., the software subsidiary of the largest private bank in Turkey, IssueTAG has made a total of 301,752 assignments (as of November 2021). One observation we make is that a large fraction of the issue reports submitted to Softtech has screenshot attachments and, in the presence of such attachments, the reports often convey less information in their one-line summary and the description fields, which tends to reduce the assignment accuracy. In this work, we use the screenshot attachments as an additional source of information to further improve the assignment accuracy, which (to the best of our knowledge) has not been studied before in this context. In particular, we develop a number of multi-source (using both the issue reports and the screenshot attachments) and single-source assignment models (using either the issue reports or the screenshot attachments) and empirically evaluate them on real issue reports. In the experiments, compared to the currently deployed single-source model in the field, the best multi-source model developed in this work, significantly (both in the practical and statistical sense) improved the assignment accuracy for the issue reports with screenshot attachments from 0.843 to 0.858 at acceptable overhead costs, a result strongly supporting our basic hypothesis.Comment: Preprint for EMSE journa

    A Longitudinal Analysis of Bug Handling Across Eclipse Releases

    International audienceLarge open source software projects, like Eclipse, follow a continuous software development process, with a regular release cycle. During each release, new bugs are reported, triaged and resolved. Previous studies have focused on various aspects of bug fixing, such as bug triaging, bug prediction, and bug process analysis. Most studies, however, do not distinguish between what happens before and after each scheduled release. We are also unaware of studies that compare bug fixing activities across different project releases. This paper presents an empirical analysis of the bug handling process of Eclipse over a 15-year period, considering 138K bug reports from Bugzilla, including 16 annual Eclipse releases and two quarterly releases in 2018. We compare the bug resolution rate, the fixing rate, the bug triaging time and the fixing time before and after each release date, and we study the possible impact of "release pressure". Among others, our results reveal that Eclipse bug handling activity is improving over time, with an important decrease in the number of reported bugs before releases, an increase in the bug fixing rate and an increasingly balanced bug handling workload before and after releases. The recent transition from an annual to a quarterly release cycle continued to improve the bug handling process

    Determinants of Success of the Open Source Selective Revealing Strategy: Solution Knowledge Emergence

    Recent research suggests that firms may be able to create a competitive advantage by deliberately revealing specific problem knowledge beyond firm boundaries to open source meta-organisations such that new solution knowledge is created that benefits the focal firm more than its competitors (Alexy, George, & Salter, 2013). Yet, not all firms that use knowledge revealing strategies are successful in inducing the emergence of solution knowledge. The extant literature has as of yet not explained this heterogeneity in success of knowledge revealing strategies. Using a longitudinal database spanning the period from 1998 to end 2012 with more than 2 billion data points that was obtained from the Mozilla Foundation, one of the top open source meta-organisations, this dissertation identifies and measures the antecedent factors affecting successful solution knowledge emergence. The results reveal 35 antecedent factors that affect solution knowledge emergence in different ways across three levels of analysis. The numerous contributions to theory and practice that follow from the results are discussed