6,730 research outputs found
Assessing the Quality of the Steps to Reproduce in Bug Reports
A major problem with user-written bug reports, indicated by developers and
documented by researchers, is the (lack of high) quality of the reported steps
to reproduce the bugs. Low-quality steps to reproduce lead to excessive manual
effort spent on bug triage and resolution. This paper proposes Euler, an
approach that automatically identifies and assesses the quality of the steps to
reproduce in a bug report, providing feedback to the reporters, which they can
use to improve the bug report. The feedback provided by Euler was assessed by
external evaluators and the results indicate that Euler correctly identified
98% of the existing steps to reproduce and 58% of the missing ones, while 73%
of its quality annotations are correct.Comment: In Proceedings of the 27th ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
'19), August 26-30, 2019, Tallinn, Estoni
Bug or Not? Bug Report Classification Using N-Gram IDF
Previous studies have found that a significant number of bug reports are
misclassified between bugs and non-bugs, and that manually classifying bug
reports is a time-consuming task. To address this problem, we propose a bug
reports classification model with N-gram IDF, a theoretical extension of
Inverse Document Frequency (IDF) for handling words and phrases of any length.
N-gram IDF enables us to extract key terms of any length from texts, these key
terms can be used as the features to classify bug reports. We build
classification models with logistic regression and random forest using features
from N-gram IDF and topic modeling, which is widely used in various software
engineering tasks. With a publicly available dataset, our results show that our
N-gram IDF-based models have a superior performance than the topic-based models
on all of the evaluated cases. Our models show promising results and have a
potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201
Translating Video Recordings of Mobile App Usages into Replayable Scenarios
Screen recordings of mobile applications are easy to obtain and capture a
wealth of information pertinent to software developers (e.g., bugs or feature
requests), making them a popular mechanism for crowdsourced app feedback. Thus,
these videos are becoming a common artifact that developers must manage. In
light of unique mobile development constraints, including swift release cycles
and rapidly evolving platforms, automated techniques for analyzing all types of
rich software artifacts provide benefit to mobile developers. Unfortunately,
automatically analyzing screen recordings presents serious challenges, due to
their graphical nature, compared to other types of (textual) artifacts. To
address these challenges, this paper introduces V2S, a lightweight, automated
approach for translating video recordings of Android app usages into replayable
scenarios. V2S is based primarily on computer vision techniques and adapts
recent solutions for object detection and image classification to detect and
classify user actions captured in a video, and convert these into a replayable
test scenario. We performed an extensive evaluation of V2S involving 175 videos
depicting 3,534 GUI-based actions collected from users exercising features and
reproducing bugs from over 80 popular Android apps. Our results illustrate that
V2S can accurately replay scenarios from screen recordings, and is capable of
reproducing 89% of our collected videos with minimal overhead. A case
study with three industrial partners illustrates the potential usefulness of
V2S from the viewpoint of developers.Comment: In proceedings of the 42nd International Conference on Software
Engineering (ICSE'20), 13 page
Locating bugs without looking back
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history
Automated Testing and Bug Reproduction of Android Apps
The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). The corresponding increase in app complexity has made app testing and maintenance activities more challenging. During app development phase, developers need to test the app in order to guarantee its quality before releasing it to the market. During the deployment phase, developers heavily rely on bug reports to reproduce failures reported by users. Because of the rapid releasing cycle of apps and limited human resources, it is difficult for developers to manually construct test cases for testing the apps or diagnose failures from a large number of bug reports. However, existing automated test case generation techniques are ineffective in exploring most effective events that can quickly improve code coverage and fault detection capability. In addition, none of existing techniques can reproduce failures directly from bug reports. This dissertation provides a framework that employs artifact intelligence (AI) techniques to improve testing and debugging of mobile apps. Specifically, the testing approach employs a Q-network that learns a behavior model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. The framework is able to capture the fine-grained details of GUI events (e.g., visiting times of events, text on the widgets) and use them as features that are fed into a deep neural network, which acts as the agent to guide the app exploration. The debugging approach focuses on automatically reproducing crashes from bug reports for mobile apps. The approach uses a combination of natural language processing (NLP), deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash
Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports
Software bug reports reported on bug-tracking systems often lack crucial
information for the developers to promptly resolve them, costing companies
billions of dollars. There has been significant research on effectively
eliciting information from bug reporters in bug tracking systems using
different templates that bug reporters need to use. However, the need for
asking follow-up questions persists. Recent studies propose techniques to
suggest these follow-up questions to help developers obtain the missing
details, but there has been little research on answering these follow up
questions, which are often unanswered. In this paper, we propose a novel
approach that uses CodeT5 in combination with Lucene, an information retrieval
technique that leverages the relevance of different bug reports, their
components, and follow-up questions to recommend answers. These top-performing
answers, along with their bug report, serve as additional context apart from
the deficient bug report to the deep learning model for generating an answer.
We evaluate our recommended answers with the manually annotated answers using
similarity metrics like Normalized Smooth BLEU Score, METEOR, Word Mover's
Distance, and Semantic Similarity. We achieve a BLEU Score of up to 34 and
Semantic Similarity of up to 64 which shows that the answers generated are
understandable and good according to Google's standard and can outperform
multiple baselines.Comment: Fixed formatting and typographical error
App Review Driven Collaborative Bug Finding
Software development teams generally welcome any effort to expose bugs in
their code base. In this work, we build on the hypothesis that mobile apps from
the same category (e.g., two web browser apps) may be affected by similar bugs
in their evolution process. It is therefore possible to transfer the experience
of one historical app to quickly find bugs in its new counterparts. This has
been referred to as collaborative bug finding in the literature. Our novelty is
that we guide the bug finding process by considering that existing bugs have
been hinted within app reviews. Concretely, we design the BugRMSys approach to
recommend bug reports for a target app by matching historical bug reports from
apps in the same category with user app reviews of the target app. We
experimentally show that this approach enables us to quickly expose and report
dozens of bugs for targeted apps such as Brave (web browser app). BugRMSys's
implementation relies on DistilBERT to produce natural language text
embeddings. Our pipeline considers similarities between bug reports and app
reviews to identify relevant bugs. We then focus on the app review as well as
potential reproduction steps in the historical bug report (from a same-category
app) to reproduce the bugs.
Overall, after applying BugRMSys to six popular apps, we were able to
identify, reproduce and report 20 new bugs: among these, 9 reports have been
already triaged, 6 were confirmed, and 4 have been fixed by official
development teams, respectively
- …