Search CORE

2 research outputs found

An Industrial Study on Predicting Crash Report Log Types Using Large Language Models

Author: Heba Aburish
Publication venue
Publication date: 25/08/2023
Field of study

Software crashes and failures take a fair amount of effort and time to resolve. Software developers use information submitted in crash reports (CRs) to conduct root cause analysis of faults. The problem is that CRs often lack all the information required. Automatic prediction of CR fields can therefore reduce the crash resolution process time. In this thesis, we use CR headings and descriptions to predict the type of log files that should be attached to a CR. Our approach is to use multilabel learning algorithms to train a machine learning model using a dataset from Ericsson’s CR database to predict the type of log files based on CR headings and descriptions. We use three different pre-trained language models Bert, Telecom Bert, and Word2Vector to extract feature vectors from CR headings and descriptions and then feed these vectors to three different multilabel learning algorithms, namely Binary Relevance (BR), Classifier Chain (CC), and Neural Network (NN). Then, we compare the performance of different feature sets. We found that the use of headings alone with pre-trained language models Bert and Telecom Bert results in the best average AUC (0.70). The use of descriptions and headings and descriptions together as features resulted in an average AUC varying from 0.65 to 0.70. In general, the algorithms showed no significant difference in their performances, but the choice of features impacts the performance. Also, the performance of predicting each type of log is influenced by the use of keywords in headings and descriptions that describe these files. We found that log types with a clear definition such as Key Performance Indicators (KPI) Logs, Post-mortem Dumps (PMD), and execution traces can be predicted with higher accuracy

Concordia University Research Repository

Automatic bug triaging techniques using machine learning and stack traces

Author: Koochekian Sabor Korosh
Publication venue
Publication date: 02/07/2019
Field of study

When a software system crashes, users have the option to report the crash using automated bug tracking systems. These tools capture software crash and failure data (e.g., stack traces, memory dumps, etc.) from end-users. These data are sent in the form of bug (crash) reports to the software development teams to uncover the causes of the crash and provide adequate fixes. The reports are first assessed (usually in a semi-automatic way) by a group of software analysts, known as triagers. Triagers assign priority to the bugs and redirect them to the software development teams in order to provide fixes. The triaging process, however, is usually very challenging. The problem is that many of these reports are caused by similar faults. Studies have shown that one way to improve the bug triaging process is to detect automatically duplicate (or similar) reports. This way, triagers would not need to spend time on reports caused by faults that have already been handled. Another issue is related to the prioritization of bug reports. Triagers often rely on the information provided by the customers (the report submitters) to prioritize bug reports. However, this task can be quite tedious and requires tool support. Next, triagers route the bug report to the responsible development team based on the subsystem, which caused the crash. Since having knowledge of all the subsystems of an ever-evolving industrial system is impractical, having a tool to automatically identify defective subsystems can significantly reduce the manual bug triaging effort. The main goal of this research is to investigate techniques and tools to help triagers process bug reports. We start by studying the effect of the presence of stack traces in analyzing bug reports. Next, we present a framework to help triagers in each step of the bug triaging process. We propose a new and scalable method to automatically detect duplicate bug reports using stack traces and bug report categorical features. We then propose a novel approach for predicting bug severity using stack traces and categorical features, and finally, we discuss a new method for predicting faulty product and component fields of bug reports. We evaluate the effectiveness of our techniques using bug reports from two large open-source systems. Our results show that stack traces and machine learning methods can be used to automate the bug triaging process, and hence increase the productivity of bug triagers, while reducing costs and efforts associated with manual triaging of bug reports

Concordia University Research Repository