2 research outputs found
An Industrial Study on Predicting Crash Report Log Types Using Large Language Models
Software crashes and failures take a fair amount of effort and time to resolve. Software developers
use information submitted in crash reports (CRs) to conduct root cause analysis of faults. The
problem is that CRs often lack all the information required. Automatic prediction of CR fields can
therefore reduce the crash resolution process time. In this thesis, we use CR headings and
descriptions to predict the type of log files that should be attached to a CR. Our approach is to use
multilabel learning algorithms to train a machine learning model using a dataset from Ericsson’s
CR database to predict the type of log files based on CR headings and descriptions. We use three
different pre-trained language models Bert, Telecom Bert, and Word2Vector to extract feature
vectors from CR headings and descriptions and then feed these vectors to three different multilabel
learning algorithms, namely Binary Relevance (BR), Classifier Chain (CC), and Neural Network
(NN). Then, we compare the performance of different feature sets. We found that the use of
headings alone with pre-trained language models Bert and Telecom Bert results in the best average
AUC (0.70). The use of descriptions and headings and descriptions together as features resulted in
an average AUC varying from 0.65 to 0.70. In general, the algorithms showed no significant
difference in their performances, but the choice of features impacts the performance. Also, the
performance of predicting each type of log is influenced by the use of keywords in headings and
descriptions that describe these files. We found that log types with a clear definition such as Key
Performance Indicators (KPI) Logs, Post-mortem Dumps (PMD), and execution traces can be
predicted with higher accuracy
Automatic bug triaging techniques using machine learning and stack traces
When a software system crashes, users have the option to report the crash using automated bug tracking systems. These tools capture software crash and failure data (e.g., stack traces, memory dumps, etc.) from end-users. These data are sent in the form of bug (crash) reports to the software development teams to uncover the causes of the crash and provide adequate fixes. The reports are first assessed (usually in a semi-automatic way) by a group of software analysts, known as triagers. Triagers assign priority to the bugs and redirect them to the software development teams in order to provide fixes.
The triaging process, however, is usually very challenging. The problem is that many of these reports are caused by similar faults. Studies have shown that one way to improve the bug triaging process is to detect automatically duplicate (or similar) reports. This way, triagers would not need to spend time on reports caused by faults that have already been handled. Another issue is related to the prioritization of bug reports. Triagers often rely on the information provided by the customers (the report submitters) to prioritize bug reports. However, this task can be quite tedious and requires tool support. Next, triagers route the bug report to the responsible development team based on the subsystem, which caused the crash. Since having knowledge of all the subsystems of an ever-evolving industrial system is impractical, having a tool to automatically identify defective subsystems can significantly reduce the manual bug triaging effort.
The main goal of this research is to investigate techniques and tools to help triagers process bug reports. We start by studying the effect of the presence of stack traces in analyzing bug reports. Next, we present a framework to help triagers in each step of the bug triaging process. We propose a new and scalable method to automatically detect duplicate bug reports using stack traces and bug report categorical features. We then propose a novel approach for predicting bug severity using stack traces and categorical features, and finally, we discuss a new method for predicting faulty product and component fields of bug reports.
We evaluate the effectiveness of our techniques using bug reports from two large open-source systems. Our results show that stack traces and machine learning methods can be used to automate the bug triaging process, and hence increase the productivity of bug triagers, while reducing costs and efforts associated with manual triaging of bug reports