3 research outputs found
Proposed Framework for Quality Assurance System with Duplicate Bug Detection
When project are having so cost. Many times the problem of bug will get occur. So, it becomes very important to have proper quality assurance system(QAS).Poorly designed quality assurance systems may exchange wrong information between developers. The purpose of this paper is to make understandings of different quality assurance systems and explain them, to find out problems present in them and give proper direction for improvement so as attract customers, raise customers satisfaction, to reduce downtime .This Paper proposes a framework to detect duplicate bug. detection, QAS, bugs
Performance of IR Models on Duplicate Bug Report Detection: A Comparative Study
Open source projects incorporate bug triagers to help with the task of bug report
assignment to developers. One of the tasks of a triager is to identify whether an incoming
bug report is a duplicate of a pre-existing report. In order to detect duplicate bug reports,
a triager either relies on his memory and experience or on the search capabilties of the bug
repository. Both these approaches can be time consuming for the triager and may also
lead to the misidentication of duplicates. It has also been suggested that duplicate bug
reports are not necessarily harmful, instead they can complement each other to provide
additional information for developers to investigate the defect at hand. This motivates the
need for automated or semi-automated techniques for duplicate bug detection.
In the literature, two main approaches have been proposed to solve this problem. The
first approach is to prevent duplicate reports from reaching developers by automatically
filtering them while the second approach deals with providing the triager a list of top-N
similar bug reports, allowing the triager to compare the incoming bug report with the ones
provided in the list. Previous works have tried to enhance the quality of the suggested
lists, but the approaches either suffered a poor Recall Rate or they incurred additional
runtime overhead, making the deployment of a retrieval system impractical. To the extent
of our knowledge, there has been little work done to do an exhaustive comparison of
the performance of different Information Retrieval Models (especially using more recent
techniques such as topic modeling) on this problem and understanding the effectiveness of
different heuristics across various application domains.
In this thesis, we compare the performance of word based models (derivatives of the
Vector Space Model) such as TF-IDF, Log-Entropy with that of topic based models such as
Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA) and Random Indexing
(RI). We leverage heuristics that incorporate exception stack frames, surface features,
summary and long description from the free-form text in the bug report. We perform
experiments on subsets of bug reports from Eclipse and Firefox and achieve a recall rate of
60% and 58% respectively. We find that word based models, in particular a Log-Entropy
based weighting scheme, outperform topic based ones such as LSI and LDA.
Using historical bug data from Eclipse and NetBeans, we determine the optimal time
frame for a desired level of duplicate bug report coverage. We realize an Online Duplicate
Detection Framework that uses a sliding window of a constant time frame as a first step
towards simulating incoming bug reports and recommending duplicates to the end user
Machine Learning And Deep Learning Based Approaches For Detecting Duplicate Bug Reports With Stack Traces
Many large software systems rely on bug tracking systems to record the submitted bug reports and to track and manage bugs. Handling bug reports is known to be a challenging task, especially in software organizations with a large client base, which tend to receive a considerable large number of bug reports a day. Fortunately, not all reported bugs are new; many are similar or identical to previously reported bugs, also called duplicate bug reports.
Automatic detection of duplicate bug reports is an important research topic to help reduce the time and effort spent by triaging and development teams on sorting and fixing bugs. This explains the recent increase in attention to this topic as evidenced by the number of tools and algorithms that have been proposed in academia and industry. The objective is to automatically detect duplicate bug reports as soon as they arrive into the system. To do so, existing techniques rely heavily on the nature of bug report data they operate on. This includes both structural information such as OS, product version, time and date of the crash, and stack traces, as well as unstructured information such as bug report summaries and descriptions written in natural language by end users and developers