1,440 research outputs found
Auto-labelling of Bug Report using Natural Language Processing
The exercise of detecting similar bug reports in bug tracking systems is
known as duplicate bug report detection. Having prior knowledge of a bug
report's existence reduces efforts put into debugging problems and identifying
the root cause. Rule and Query-based solutions recommend a long list of
potential similar bug reports with no clear ranking. In addition, triage
engineers are less motivated to spend time going through an extensive list.
Consequently, this deters the use of duplicate bug report retrieval solutions.
In this paper, we have proposed a solution using a combination of NLP
techniques. Our approach considers unstructured and structured attributes of a
bug report like summary, description and severity, impacted products,
platforms, categories, etc. It uses a custom data transformer, a deep neural
network, and a non-generalizing machine learning method to retrieve existing
identical bug reports. We have performed numerous experiments with significant
data sources containing thousands of bug reports and showcased that the
proposed solution achieves a high retrieval accuracy of 70% for [email protected]: 7 Pages, 11 Figure
Bug Fix Time Optimization Using Matrix Factorization and Iterative Gale-Shaply Algorithms
Bug triage is an essential task in software maintenance phase. It assigns
developers (fixers) to bug reports to fix them. This process is performed
manually by a triager, who analyzes developers profiles and submitted bug
reports to make suitable assignments. Bug triaging process is time consuming
thus automating this process is essential to improve the quality of software.
Previous work addressed triaging problem either as an information retrieval or
classification problem. This paper tackles this problem as a resource
allocation problem, that aims at the best assignments of developers to bug
reports, that reduces the total fixing time of the newly submitted bug reports,
in addition to the even distribution of bug reports over developers. In this
paper, a combination of matrix factorization and Gale Shapely algorithm,
supported by the differential evolution is firstly introduced to optimize the
total fix time and normalize developers work load. Matrix factorization is used
to establish a recommendation system for Gale-Shapley to make assignment
decisions. Differential evolution provides the best set of weights to build
developers score profiles. The proposed approach is assessed over three
repositories, Linux, Apache and Eclipse. Experimental results show that the
proposed approach reduces the bug fixing time, in comparison to the manual
triage, by 80.67%, 23.61% and 60.22% over Linux, Eclipse and Apache
respectively. Moreover, the workload for the developers is uniform.Comment: 14 page, 7 figures, 8 tables, 10 equation
Bug or Not? Bug Report Classification Using N-Gram IDF
Previous studies have found that a significant number of bug reports are
misclassified between bugs and non-bugs, and that manually classifying bug
reports is a time-consuming task. To address this problem, we propose a bug
reports classification model with N-gram IDF, a theoretical extension of
Inverse Document Frequency (IDF) for handling words and phrases of any length.
N-gram IDF enables us to extract key terms of any length from texts, these key
terms can be used as the features to classify bug reports. We build
classification models with logistic regression and random forest using features
from N-gram IDF and topic modeling, which is widely used in various software
engineering tasks. With a publicly available dataset, our results show that our
N-gram IDF-based models have a superior performance than the topic-based models
on all of the evaluated cases. Our models show promising results and have a
potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201
Easy over Hard: A Case Study on Deep Learning
While deep learning is an exciting new technique, the benefits of this method
need to be assessed with respect to its computational cost. This is
particularly important for deep learning since these learners need hours (to
weeks) to train the model. Such long training time limits the ability of (a)~a
researcher to test the stability of their conclusion via repeated runs with
different random seeds; and (b)~other researchers to repeat, improve, or even
refute that original work.
For example, recently, deep learning was used to find which questions in the
Stack Overflow programmer discussion forum can be linked together. That deep
learning system took 14 hours to execute. We show here that applying a very
simple optimizer called DE to fine tune SVM, it can achieve similar (and
sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84
times faster hours than deep learning method.
We offer these results as a cautionary tale to the software analytics
community and suggest that not every new innovation should be applied without
critical analysis. If researchers deploy some new and expensive process, that
work should be baselined against some simpler and faster alternatives.Comment: 12 pages, 6 figures, accepted at FSE201
Data-Driven Application Maintenance: Views from the Trenches
In this paper we present our experience during design, development, and pilot
deployments of a data-driven machine learning based application maintenance
solution. We implemented a proof of concept to address a spectrum of
interrelated problems encountered in application maintenance projects including
duplicate incident ticket identification, assignee recommendation, theme
mining, and mapping of incidents to business processes. In the context of IT
services, these problems are frequently encountered, yet there is a gap in
bringing automation and optimization. Despite long-standing research around
mining and analysis of software repositories, such research outputs are not
adopted well in practice due to the constraints these solutions impose on the
users. We discuss need for designing pragmatic solutions with low barriers to
adoption and addressing right level of complexity of problems with respect to
underlying business constraints and nature of data.Comment: Earlier version of paper appearing in proceedings of the 4th
International Workshop on Software Engineering Research and Industrial
Practice (SER&IP), IEEE Press, pp. 48-54, 201
Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development
Mobile devices and platforms have become an established target for modern
software developers due to performant hardware and a large and growing user
base numbering in the billions. Despite their popularity, the software
development process for mobile apps comes with a set of unique, domain-specific
challenges rooted in program comprehension. Many of these challenges stem from
developer difficulties in reasoning about different representations of a
program, a phenomenon we define as a "language dichotomy". In this paper, we
reflect upon the various language dichotomies that contribute to open problems
in program comprehension and development for mobile apps. Furthermore, to help
guide the research community towards effective solutions for these problems, we
provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference
on Program Comprehension (ICPC'18
- …