92 research outputs found

    Automatic bug triaging techniques using machine learning and stack traces

    Get PDF
    When a software system crashes, users have the option to report the crash using automated bug tracking systems. These tools capture software crash and failure data (e.g., stack traces, memory dumps, etc.) from end-users. These data are sent in the form of bug (crash) reports to the software development teams to uncover the causes of the crash and provide adequate fixes. The reports are first assessed (usually in a semi-automatic way) by a group of software analysts, known as triagers. Triagers assign priority to the bugs and redirect them to the software development teams in order to provide fixes. The triaging process, however, is usually very challenging. The problem is that many of these reports are caused by similar faults. Studies have shown that one way to improve the bug triaging process is to detect automatically duplicate (or similar) reports. This way, triagers would not need to spend time on reports caused by faults that have already been handled. Another issue is related to the prioritization of bug reports. Triagers often rely on the information provided by the customers (the report submitters) to prioritize bug reports. However, this task can be quite tedious and requires tool support. Next, triagers route the bug report to the responsible development team based on the subsystem, which caused the crash. Since having knowledge of all the subsystems of an ever-evolving industrial system is impractical, having a tool to automatically identify defective subsystems can significantly reduce the manual bug triaging effort. The main goal of this research is to investigate techniques and tools to help triagers process bug reports. We start by studying the effect of the presence of stack traces in analyzing bug reports. Next, we present a framework to help triagers in each step of the bug triaging process. We propose a new and scalable method to automatically detect duplicate bug reports using stack traces and bug report categorical features. We then propose a novel approach for predicting bug severity using stack traces and categorical features, and finally, we discuss a new method for predicting faulty product and component fields of bug reports. We evaluate the effectiveness of our techniques using bug reports from two large open-source systems. Our results show that stack traces and machine learning methods can be used to automate the bug triaging process, and hence increase the productivity of bug triagers, while reducing costs and efforts associated with manual triaging of bug reports

    ADPTriage: Approximate Dynamic Programming for Bug Triage

    Full text link
    Bug triaging is a critical task in any software development project. It entails triagers going over a list of open bugs, deciding whether each is required to be addressed, and, if so, which developer should fix it. However, the manual bug assignment in issue tracking systems (ITS) offers only a limited solution and might easily fail when triagers must handle a large number of bug reports. During the automated assignment, there are multiple sources of uncertainties in the ITS, which should be addressed meticulously. In this study, we develop a Markov decision process (MDP) model for an online bug triage task. In addition to an optimization-based myopic technique, we provide an ADP-based bug triage solution, called ADPTriage, which has the ability to reflect the downstream uncertainty in the bug arrivals and developers' timetables. Specifically, without placing any limits on the underlying stochastic process, this technique enables real-time decision-making on bug assignments while taking into consideration developers' expertise, bug type, and bug fixing time. Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time. We also demonstrate the empirical convergence of the model and conduct sensitivity analysis with various model parameters. Accordingly, this work constitutes a significant step forward in addressing the uncertainty in bug triage solution

    Survey on Automated Bugs Triage System

    Get PDF
    Nowadays IT companies is spending more than 40 percent of their cost in fixing software bugs, traditonally these bugs are fixed by manual assignement to a particular developer , this approach causes too much dependency, the new and alternative approach is the bug triage system which fix the bug automatically , which automatically assign the reported bug to a develop which decreases the time and cost in in manual work, different classification techniques are used to conduct automatic bug triage. In this paper, we propose to apply machine learning techniques to assist in bug triage to predict which developer should be assigned on the bug based on its description by applying text categrorization . We will address the problem of data reduction for bug triage, i.e. How the quality of bug data would be improved

    Who Made This Copy? An Empirical Analysis of Code Clone Authorship

    Full text link
    Code clones are code snippets that are identical or similar to other snippets within the same or different files. They are often created through copy-and-paste practices during development and maintenance activities. Since code clones may require consistent updates and coherent management, they present a challenging issue in software maintenance. Therefore, many studies have been conducted to find various types of clones with accuracy, scalability, or performance. However, the exploration of the nature of code clones has been limited. Even the fundamental question of whether code snippets in the same clone set were written by the same author or different authors has not been thoroughly investigated. In this paper, we investigate the characteristics of code clones with a focus on authorship. We analyzed the authorship of code clones at the line-level granularity for Java files in 153 Apache projects stored on GitHub and addressed three research questions. Based on these research questions, we found that there are a substantial number of clone lines across all projects (an average of 18.5\% for all projects). Furthermore, authors who contribute to many non-clone lines also contribute to many clone lines. Additionally, we found that one-third of clone sets are primarily contributed to by multiple leading authors. These results confirm our intuitive understanding of clone characteristics, although no previous publications have provided empirical validation data from multiple projects. As the results could assist in designing better clone management techniques, we will explore the implications of developing an effective clone management tool.Comment: An extended version of the 17th International Workshop on Software Clones IWSC 2023 in Bogota, Colombi

    Analysis and Interactive Visualization of Software Bug Reports

    Get PDF
    A software Bug report contains information about the bug in the form of problem description and comments using natural language texts. Managing reported bugs is a significant challenge for a project manager when the number of bugs for a software project is large. Prior to the assignment of a newly reported bug to an appropriate developer, the triager (e.g., manager) attempts to categorize it into existing categories and looks for duplicate bugs. The goal is to reuse existing knowledge to fix or resolve the new bug, and she often spends a lot of time in reading a number of bug reports. When fixing or resolving a bug, a developer also consults with a series of relevant bug reports from the repository in order to maximize the knowledge required for the fixation. It is also preferable that developers new to a project first familiarize themselves with the project along with the reported bugs before actually working on the project. Because of the sheer numbers and size of the bug reports, manually analyzing a collection of bug reports is time-consuming and ineffective. One of the ways to mitigate the problem is to analyze summaries of the bug reports instead of analyzing full bug reports, and there have been a number of summarization techniques proposed in the literature. Most of these techniques generate extractive summaries of bug reports. However, it is not clear how useful those generated extractive summaries are, in particular when the developers do not have prior knowledge of the bug reports. In order to better understand the usefulness of the bug report summaries, in this thesis, we first reimplement a state of the art unsupervised summarization technique and evaluate it with a user study with nine participants. Although in our study, 70% of the time participants marked our developed summaries as a reliable means of comprehending the software bugs, the study also reports a practical problem with extractive summaries. An extractive summary is often created by choosing a certain number of statements from the bug report. The statements are extracted out of their contexts, and thus often lose their consistency, which makes it hard for a manager or a developer to comprehend the reported bug from the extractive summary. Based on the findings from the user study and in order to further assist the managers as well as the developers, we thus propose an interactive visualization for the bug reports that visualizes not only the extractive summaries but also the topic evolution of the bug reports. Topic evolution refers to the evolution of technical topics discussed in the bug reports of a software system over a certain time period. Our visualization technique interactively visualizes such information which can help in different project management activities. Our proposed visualization also highlights the summary statements within their contexts in the original report for easier comprehension of the reported bug. In order to validate the applicability of our proposed visualization technique, we implement the technique as a standalone tool, and conduct both a case study with 3914 bug reports and a user study with six participants. The experiments in the case study show that our topic analysis can reveal useful keywords or other insightful information about the bug reports for aiding the managers or triagers in different management activities. The findings from the user study also show that our proposed visualization technique is highly promising for easier comprehension of the bug reports

    Essays on software vulnerability coordination

    Get PDF
    Software vulnerabilities are software bugs with security implications. Exposure to a security bug makes a software system behave in unexpected ways when the bug is exploited. As software vulnerabilities are thus a classical way to compromise a software system, these have long been coordinated in the global software industry in order to lessen the risks. This dissertation claims that the coordination occurs in a complex and open socio-technical system composed of decentralized software units and heterogeneous software agents, including not only software engineers but also other actors, from security specialists and software testers to attackers with malicious motives. Vulnerability disclosure is a classical example of the associated coordination; a security bug is made known to a software vendor by the discoverer of the bug, a third-party coordinator, or public media. The disclosure is then used to patch the bug. In addition to patching, the bug is typically archived to databases, cataloged and quantified for additional information, and communicated to users with a security advisory. Although commercial solutions have become increasingly important, the underlying coordination system is still governed by multiple stakeholders with vested interests. This governance has continued to result in different inefficiencies. Thus, this dissertation examines four themes: (i) disclosure of software vulnerabilities; (ii) coordination of these; (iii) evolution of these across time; and (iv) automation potential. The philosophical position is rooted in scientific realism and positivism, while regression analysis forms the kernel of the methodology. Based on these themes, the results indicate that (a) when vulnerability disclosure has worked, it has been relatively efficient; the obstacles have been social rather than technical in nature, originating from the diverging interests of the stakeholders who have different incentives. Furthermore, (b) the efficiency applies also to the coordination of different identifiers and classifications for the vulnerabilities disclosed. Longitudinally, (c) also the evolution of software vulnerabilities across time reflect distinct software and vulnerability life cycle models and the incentives underneath. Finally, (d) there is potential to improve the coordination efficiency through software automation

    Test sequence generation with Cayley graphs

    Get PDF
    The paper presents a theoretical foundation for test sequence generation based on an input specification. The set of possible test sequences is first partitioned according to a generic “triaging” function, which can be created from a state-machine specification in various ways. The notion of coverage metric is then expressed in terms of the categories produced by this function. Many existing test generation problems, such as t-way state or transition coverage, become particular cases of this generic framework. We then present algorithms for generating sets of test sequences providing guaranteed full coverage with respect to a metric, by building and processing a special type of graph called a Cayley graph. An implementation of these concepts is then experimentally evaluated against existing techniques, and shows it provides better performance in terms of running time and test suite size

    Version history, similar report, and structure: Putting them together for improved bug localization

    Get PDF
    During the evolution of a software system, a large number of bug reports are submitted. Locating the source code files that need to be fixed to resolve the bugs is a challenging problem. Thus, there is a need for a technique that can automatically figure out these buggy files. A number of bug localization solutions that take in a bug report and output a ranked list of files sorted based on their likelihood to be buggy have been proposed in the literature. However, the accuracy of these tools still need to be improved. In this paper, to address this need, we propose AmaLgam, a new method for locating relevant buggy files that puts together version history, similar reports, and structure. To do this, AmaLgam integrates a bug prediction technique used in Google which analyzes version history, with a bug localization technique named BugLocator which analyzes similar reports from bug report system, and the state-of-the-art bug localization technique BLUiR which considers structure. We perform a large-scale experiment on four open source projects, namely AspectJ, Eclipse, SWT and ZXing to localize more than 3,000 bugs. Compared with a history-aware bug localization solution of Sisman and Kak, our approach achieves a 46.1 % improvement in terms of mean average precision (MAP). Compared with BugLocator, our approach achieves a 24.4 % improvement in terms of MAP. Compared with BLUiR, our approach achieves a 16.4% improvement in terms of MAP

    Software Maintenance At Commit-Time

    Get PDF
    Software maintenance activities such as debugging and feature enhancement are known to be challenging and costly, which explains an ever growing line of research in software maintenance areas including mining software repository, default prevention, clone detection, and bug reproduction. The main goal is to improve the productivity of software developers as they undertake maintenance tasks. Existing tools, however, operate in an offline fashion, i.e., after the changes to the systems have been made. Studies have shown that software developers tend to be reluctant to use these tools as part of a continuous development process. This is because they require installation and training, hindering their integration with developers’ workflow, which in turn limits their adoption. In this thesis, we propose novel approaches to support software developers at commit-time. As part of the developer’s workflow, a commit marks the end of a given task. We show how commits can be used to catch unwanted modifications to the system, and prevent the introduction of clones and bugs, before these modifications reach the central code repository. We also propose a bug reproduction technique that is based on model checking and crash traces. Furthermore, we propose a new way for classifying bugs based on the location of fixes that can serve as the basis for future research in this field of study. The techniques proposed in this thesis have been tested on over 400 open and closed (industrial) systems, resulting in high levels of precision and recall. They are also scalable and non-intrusive
    • …
    corecore