22 research outputs found

    Bug Triaging with High Confidence Predictions

    Get PDF
    Correctly assigning bugs to the right developer or team, i.e., bug triaging, is a costly activity. A concerted effort at Ericsson has been done to adopt automated bug triaging to reduce development costs. We also perform a case study on Eclipse bug reports. In this work, we replicate the research approaches that have been widely used in the literature including FixerCache. We apply them on over 10k bug reports for 9 large products at Ericsson and 2 large Eclipse products containing 21 components. We find that a logistic regression classifier including simple textual and categorical attributes of the bug reports has the highest accuracy of 79.00% and 46% on Ericsson and Eclipse bug reports respectively. Ericsson’s bug reports often contain logs that have crash dumps and alarms. We add this information to the bug triage models. We find that this information does not improve the accuracy of bug triaging in Ericsson’s context. Eclipse bug reports contain the stack traces that we add to the bug triaging model. Stack traces are only present in 8% of bug reports and do not improve the triage accuracy. Although our models perform as well as the best ones reported in the literature, a criticism of bug triaging at Ericsson is that accuracy is not sufficient for regular use. We develop a novel approach that only triages bugs when the model has high confidence in the triage prediction. We find that we improve the accuracy to 90% at Ericsson and 70% at Eclipse, but we can make predictions for 62% and 25% of the total Ericsson and Eclipse bug reports,respectively

    Adopting Automated Bug Assignment in Practice: A Longitudinal Case Study at Ericsson

    Full text link
    The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention happened in April 2019. Our study evaluates the adoption of TRR within its industrial context at Ericsson. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. We conduct an industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.Comment: Under revie

    Analyze the Performance of Software by Machine Learning Methods for Fault Prediction Techniques

    Get PDF
    Trend of using the software in daily life is increasing day by day. Software system development is growing more difficult as these technologies are integrated into daily life. Therefore, creating highly effective software is a significant difficulty. The quality of any software system continues to be the most important element among all the required characteristics. Nearly one-third of the total cost of software development goes toward testing. Therefore, it is always advantageous to find a software bug early in the software development process because if it is not found early, it will drive up the cost of the software development. This type of issue is intended to be resolved via software fault prediction. There is always a need for a better and enhanced prediction model in order to forecast the fault before the real testing and so reduce the flaws in the time and expense of software projects. The various machine learning techniques for classifying software bugs are discussed in this paper

    Duplicate Defect Detection

    Get PDF
    Discovering and fixing faults is an unavoidable process in Software Engineering. It is always a good practice to document and organize fault reports. This facilitates the effectiveness of development and maintenance process. Bug Tracking Repositories, such as Bugzilla, are designed to provide fault reporting facilities for developers, testers and users of the system. Allowing anyone to contribute finding and reporting faults has an immediate impact on software quality. However, this benefit comes with one side-effect. Users often file reports that describe the same fault. This increases the triaging time spent by the maintainers. At the same time, important information required to fix the fault is likely to be distributed across different reports.;The objective of this thesis is twofold. First, we want to understand the dynamics of bug report filing for a large, long duration open source project, Firefox. Second, we present a new approach that can reduce the number of duplicate reports. The novel element in the proposed approach is the ability to concentrate the search for duplicates on specific portions of the bug repository. This improves the performance of Information Retrieval techniques and classification runtime of our algorithm. Our system can be deployed as a search tool to help reporters query the repository or it can be adopted to help maintainers detect duplicate reports. In both cases the performance is satisfactory. When tested as a search tool our system is able to detect up to 53% of duplicate reports. The approach adapted for maintainers has a maximum recall rate of 59%

    Recommending Bug Assignment Approaches for Individual Bug Reports: An Empirical Investigation

    Full text link
    Multiple approaches have been proposed to automatically recommend potential developers who can address bug reports. These approaches are typically designed to work for any bug report submitted to any software project. However, we conjecture that these approaches may not work equally well for all the reports in a project. We conducted an empirical study to validate this conjecture, using three bug assignment approaches applied on 2,249 bug reports from two open source systems. We found empirical evidence that validates our conjecture, which led us to explore the idea of identifying and applying the best-performing approach for each bug report to obtain more accurate developer recommendations. We conducted an additional study to assess the feasibility of this idea using machine learning. While we found a wide margin of accuracy improvement for this approach, it is far from achieving the maximum possible improvement and performs comparably to baseline approaches. We discuss potential reasons for these results and conjecture that the assignment approaches may not capture important information about the bug assignment process that developers perform in practice. The results warrant future research in understanding how developers assign bug reports and improving automated bug report assignmen

    An Empirical Study of Runtime Files Attached to Crash Reports

    Get PDF
    When a software system crashes, users report the crash using crash report tracking tools. A crash report (CR) is then routed to software developers for review to fix the problem. A CR contain a wealth of information that allow developers to diagnose the root causes of problems and provides fixes. This is particularly important at Ericsson, one of the world’s largest Telecom company, in which this study was conducted. The handling of CRs at Ericsson goes through multiple lines of supports until a solution is provided. To make this possible, Ericsson software engineers and network operators rely on runtime data that is collected during the crash. This data is organized into files that are attached to the CRs. However, not all CRs contain this data in the first place. Software engineers and network operators often have to request additional files after the CR is created and sent to different Ericsson support lines, a problem that often delays the resolution process. In this thesis, we conduct an empirical study of the runtime files attached to Ericsson CRs. We focus on answering four research questions that revolved around the proportion of runtime files in a selected set of CRs, the relationship between the severity of CRs and the type of files they contain, the impact of different file types on the time to fix the CR, and the possibility to predict whether a CR should have runtime data attached to it at the CR submission time. Our ultimate goal is to understand how runtime data is used during the CR handling process at Ericsson and what recommendations we can make to improve this process

    The Impact of Parallel and Batch Testing in Continuous Integration Environments

    Get PDF
    Testing is a costly, time-consuming, and challenging part of modern software development. During continuous integration, after submitting each change, it is tested automatically to ensure that it does not break the system’s functionality. A common approach to reducing the number of test case executions is to batch changes together for testing. For example, given four changes to test, if we group them in a batch and they pass we use one execution to test all four changes. However, if they fail, additional executions are required to find the culprit change that is responsible for the failure. In this study we first investigate the impact of batch testing in the level of the builds. We evaluate five batch culprit finding approaches: Dorfman, double pool testing, BatchBisect, BatchStop4, and our novel BatchDivide4. All prior works on batching use a constant batch size. In this work, we propose a dynamic batch size technique based on the weighted historical failure rate of the project. We simulate each of the batching strategies across 12 large projects on Travis with varying failures rate. We find that dynamic batching coupled with BatchDivide4 outperforms the other approaches. Compared to TestAll, this approach decreases the number of executions by 47.49% on average across the Travis projects. It outperforms the current state-of-the-art constant batch size approach, i.e. Batch4 by 5.17 percentage points. Our historical weighting approach leads us to a metric that describes the number of consecutive build failures. We find that the correlation between batch savings and FailureSpread is r = −0.97 with a p â‰Ș 0.0001. This metric easily allows developers to determine the potential of batching on their project. However, we then show that in the case of failure of a batch, re-running all the test cases is inefficient. Also, for companies with notable resource constraints, e.g., Ericsson, running all the tests in a single machine is not possible and realistic. To address this issues we extend our work to an industrial application at Ericsson. We first evaluate the effect of parallel testing for a project at Ericsson. We find that the re- lationship between the number of available machines for parallelization and the FeedbackTime is nonlinear. For example, we can increase the number of machines by 25% and reduce the Feedback- Time by 53%. We then examine three batching strategies in the test level: ConstantBatching, TestDynamic- Batching, and TestCaseBatching. We evaluate their performance by varying the number of parallel machines. For ConstantBatching, we experiment with batch sizes from 2 to 32. The majority of the saving is achieved using batch sizes smaller than 8. However, ConstantBatching increases the feedback time if there are more than 6 parallel machines available. To solve this problem, we pro- pose TestDynamicBatching which batches all of the queued changes whenever there are resources available. Compared to TestAll TestDynamicBatching reduces the AvgFeedback time and AvgCPU time between 15.78% and 80.38%, and 3.13% and 48.78% depending on the number of machines. Batching all the changes in the queue can increase the test scope. To address this issue we propose TestCaseBatching which performs batching at the test level instead of the change level. Using Test- CaseBatching will reduce the AvgFeedback time and AvgCPU time between 19.84% and 84.20%, and 5.65% and 50.92% respectively, depending on the number of available machines for parallel testing. TestCaseBatching is highly effective and we hope other companies will adopt it

    Software Batch Testing to Reduce Build Test Executions

    Get PDF
    Testing is expensive and batching tests have the potential to reduce test costs. The continuous integration strategy of testing each commit or change individually helps to quickly identify faults but leads to a maximum number of test executions. Large companies that have a large number of commits, e.g. Google and Facebook, or have expensive test infrastructure, e.g. Ericsson, must batch changes together to reduce the number of total test runs. For example, if eight builds are batched together and there is no failure, then we have tested eight builds with one execution saving seven executions. However, when a failure occurs it is not immediately clear which build is the cause of the failure. A bisection is run to isolate the failing build, i.e. the culprit build. In our eight builds example, a failure will require an additional 6 executions, resulting in a saving of one execution. The goal of this work is to improve the efficiency of the batch testing. We evaluate six approaches. The first is the baseline approach that tests each build individually. The second, is the existing bisection approach. The third uses a batch size of four, which we show mathematically reduces the number of execution without requiring bisection. The fourth combines the two prior techniques introducing a stopping condition to the bisection. The final two approaches use models of build change risk to isolate risky changes and test them in smaller batches. We evaluate the approaches on nine open source projects that use Travis CI. Compared to the TestAll baseline, on average, the approaches reduce the number of build test executions across projects by 46%, 48%, 50%, 44%, and 49% for BatchBisect, Batch4, BatchStop4, RiskTopN, and RiskBatch, respectively. The greatest reduction is BatchStop4 at 50%. However, the simple approach of Batch4 does not require bisection and achieves a reduction of 48%. We recommend that all CI pipelines use a batch size of at least four. We release our scripts and data for replication. Regardless of the approach, on average, we save around half the build test executions compared to testing each change individually. We release the BatchBuilder tool that automatically batches submitted changes on GitHub for testing on Travis CI. Since the tool reports individual results for each pull-request or pushed commit, the batching happens in the background and the development process is unchanged

    On Test Selection, Prioritization, Bisection, and Guiding Bisection with Risk Models

    Get PDF
    The cost of software testing has become a burden for software companies in the era of rapid release and continuous integration. In the first part of our work, we evaluate the results of adopting multiple test selection and prioritization approaches for improving test effectiveness in of the test stages of our industrial partner Ericsson Inc. We confirm the existence of valuable information in the test execution history. In particular, the association between test failures provide the most value to the test selection and prioritization processes. More importantly, during this exercise, we encountered various challenges that are unseen or undiscussed in prior research. We document the challenges, our solutions and the lessons learned as an experience report. In the second part of our work, we explore batch testing in test execution environments and how it can help to reduce the test execution costs. One approach to reducing the test execution costs is to group changes into batches and test them at once. In this work, we study the impact of batch testing in reducing the number of test executions to deliver changes and find culprit commits. We factor test flakiness into our simulations and its impact on the optimal batch size. Moreover, we address another problem with batch testing, how to find a culprit commit when a batch fails. We introduce a novel technique where we guide bisection based on two risk models: a bug model and a test execution history model. We isolate the risky commits by testing them individually, while the less risky commits are tested in a single large batch. Our results show that batch testing can be improved by adopting our culprit prediction models. The results we present here have convinced Ericsson developers to implement our culprit risk predictions in the CulPred tool that will make their continuous integration pipeline more efficient

    From Bugs to Decision Support – Leveraging Historical Issue Reports in Software Evolution

    Get PDF
    Software developers in large projects work in complex information landscapes and staying on top of all relevant software artifacts is an acknowledged challenge. As software systems often evolve over many years, a large number of issue reports is typically managed during the lifetime of a system, representing the units of work needed for its improvement, e.g., defects to fix, requested features, or missing documentation. Efficient management of incoming issue reports requires the successful navigation of the information landscape of a project. In this thesis, we address two tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team, and CIA is the subsequent activity of identifying how source code changes affect the existing software artifacts. While IA is fundamental in all large software projects, CIA is particularly important to safety-critical development. Our solution approach, grounded on surveys of industry practice as well as scientific literature, is to support navigation by combining information retrieval and machine learning into Recommendation Systems for Software Engineering (RSSE). While the sheer number of incoming issue reports might challenge the overview of a human developer, our techniques instead benefit from the availability of ever-growing training data. We leverage the volume of issue reports to develop accurate decision support for software evolution. We evaluate our proposals both by deploying an RSSE in two development teams, and by simulation scenarios, i.e., we assess the correctness of the RSSEs' output when replaying the historical inflow of issue reports. In total, more than 60,000 historical issue reports are involved in our studies, originating from the evolution of five proprietary systems for two companies. Our results show that RSSEs for both IA and CIA can help developers navigate large software projects, in terms of locating development teams and software artifacts. Finally, we discuss how to support the transfer of our results to industry, focusing on addressing the context dependency of our tool support by systematically tuning parameters to a specific operational setting
    corecore