10 research outputs found
Data-Driven Application Maintenance: Views from the Trenches
In this paper we present our experience during design, development, and pilot
deployments of a data-driven machine learning based application maintenance
solution. We implemented a proof of concept to address a spectrum of
interrelated problems encountered in application maintenance projects including
duplicate incident ticket identification, assignee recommendation, theme
mining, and mapping of incidents to business processes. In the context of IT
services, these problems are frequently encountered, yet there is a gap in
bringing automation and optimization. Despite long-standing research around
mining and analysis of software repositories, such research outputs are not
adopted well in practice due to the constraints these solutions impose on the
users. We discuss need for designing pragmatic solutions with low barriers to
adoption and addressing right level of complexity of problems with respect to
underlying business constraints and nature of data.Comment: Earlier version of paper appearing in proceedings of the 4th
International Workshop on Software Engineering Research and Industrial
Practice (SER&IP), IEEE Press, pp. 48-54, 201
MultiDimEr: a multi-dimensional bug analyzEr
Background: Bugs and bug management consumes a significant amount of time and
effort from software development organizations. A reduction in bugs can
significantly improve the capacity for new feature development. Aims: We
categorize and visualize dimensions of bug reports to identify accruing
technical debt. This evidence can serve practitioners and decision makers not
only as an argumentative basis for steering improvement efforts, but also as a
starting point for root cause analysis, reducing overall bug inflow. Method: We
implemented a tool, MultiDimEr, that analyzes and visualizes bug reports. The
tool was implemented and evaluated at Ericsson. Results: We present our
preliminary findings using the MultiDimEr for bug analysis, where we
successfully identified components generating most of the bugs and bug trends
within certain components. Conclusions: By analyzing the dimensions provided by
MultiDimEr, we show that classifying and visualizing bug reports in different
dimensions can stimulate discussions around bug hot spots as well as validating
the accuracy of manually entered bug report attributes used in technical debt
measurements such as fault slip through.Comment: TechDebt@ICSE 2022: 66-7
How Usability Defects Defer from Non-Usability Defects? : A Case Study on Open Source Projects
Usability is one of the software qualities attributes that is subjective and often considered as a less critical defect to be fixed. One of the reasons was due to the vague defect descriptions that could not convince developers about the validity of usability issues. Producing a comprehensive usability defect description can be a challenging task, especially in reporting relevant and important information. Prior research in improving defect report comprehension has often focused on defects in general or studied various aspects of software quality improvement such as triaging defect reports, metrics and predictions, automatic defect detection and fixing. Â In this paper, we studied 2241 usability and non-usability defects from three open-source projects - Mozilla Thunderbird, Firefox for Android, and Eclipse Platform. We examined the presence of eight defect attributes - steps to reproduce, impact, software context, expected output, actual output, assume cause, solution proposal, and supplementary information, and used various statistical tests to answer the research questions. In general, we found that usability defects are resolved slower than non-usability defects, even for non-usability defect reports that have less information. In terms of defect report content, usability defects often contain output details and software context while non-usability defects are preferably explained using supplementary information, such as stack traces and error logs. Our research findings extend the body of knowledge of software defect reporting, especially in understanding the characteristics of usability defects. The promising results also may be valuable to improve software development practitioners' practice
A Bug Bounty Perspective on the Disclosure of Web Vulnerabilities
Bug bounties have become increasingly popular in recent years. This paper
discusses bug bounties by framing these theoretically against so-called
platform economy. Empirically the interest is on the disclosure of web
vulnerabilities through the Open Bug Bounty (OBB) platform between 2015 and
late 2017. According to the empirical results based on a dataset covering
nearly 160 thousand web vulnerabilities, (i) OBB has been successful as a
community-based platform for the dissemination of web vulnerabilities. The
platform has also attracted many productive hackers, (ii) but there exists a
large productivity gap, which likely relates to (iii) a knowledge gap and the
use of automated tools for web vulnerability discovery. While the platform (iv)
has been exceptionally fast to evaluate new vulnerability submissions, (v) the
patching times of the web vulnerabilities disseminated have been long. With
these empirical results and the accompanying theoretical discussion, the paper
contributes to the small but rapidly growing amount of research on bug
bounties. In addition, the paper makes a practical contribution by discussing
the business models behind bug bounties from the viewpoints of platforms,
ecosystems, and vulnerability markets.Comment: 17th Annual Workshop on the Economics of Information Security,
Innsbruck, https://weis2018.econinfosec.org
Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionality of feature space. These techniques are expected particularly to improve efficiency, accuracy, and comprehensibility of the classification models in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank a term. Existing feature selection techniques (e.g. RDC, NRDC) consider frequently occurring terms and ignore rarely occurring terms count in a class. However, this study proposes the Improved Relative Discriminative Criterion (IRDC) technique which considers rarely occurring terms count. It is argued that rarely occurring terms count are also meaningful and important as frequently occurring terms in a class. The proposed IRDC is compared to the most recent feature selection techniques RDC and NRDC. The results reveal significant improvement by the proposed IRDC technique for feature selection in terms of precision 27%, recall 30%, macro-average 35% and micro- average 30%. Additionally, this study also proposes a hybrid algorithm named: Ringed Seal Search-Support Vector Machine (RSS-SVM) to improve the generalization and learning capability of the SVM. The proposed RSS-SVM optimizes kernel and penalty parameter with the help of RSS algorithm. The proposed RSS-SVM is compared to the most recent techniques GA-SVM and CS-SVM. The results show significant improvement by the proposed RSS-SVM for classification in terms of accuracy 18.8%, recall 15.68%, precision 15.62% and specificity 13.69%. In conclusion, the proposed IRDC has shown better performance as compare to existing techniques because its capability in considering rare and informative terms. Additionally, the proposed RSS- SVM has shown better performance as compare to existing techniques because it has capability to improve balance between exploration and exploitation
Machine Learning for Software Engineering: A Tertiary Study
Machine learning (ML) techniques increase the effectiveness of software
engineering (SE) lifecycle activities. We systematically collected,
quality-assessed, summarized, and categorized 83 reviews in ML for SE published
between 2009-2022, covering 6,117 primary studies. The SE areas most tackled
with ML are software quality and testing, while human-centered areas appear
more challenging for ML. We propose a number of ML for SE research challenges
and actions including: conducting further empirical validation and industrial
studies on ML; reconsidering deficient SE methods; documenting and automating
data collection and pipeline processes; reexamining how industrial
practitioners distribute their proprietary data; and implementing incremental
ML approaches.Comment: 37 pages, 6 figures, 7 tables, journal articl
Intelligent Software Bugs Localization, Triage and Prioritization
One of the time-consuming software maintenance tasks is the localization of software bugs especially in large systems. Developers have to follow a tedious process to reproduce the abnormal behavior then inspect a large number of files in order to resolve the bugs. Furthermore, software developers are usually overwhelmed with several reports of critical bugs to be addressed urgently and simultaneously. The management of these bugs is a complex problem due to the limited resources and the deadlines-pressure. Another critical task in this process is to assign appropriate priority to the bugs and eventually assign them to the right developers for resolution. Several studies have been proposed for bugs localization, the majority of them are recommending classes as outputs which may still require high inspection effort. In addition, there is a significant difference between the natural language used in bug reports and the programming language which limits the efficiency of existing approaches since most of them are mainly based on lexical similarity. Most of the existing studies treated bug reports in isolation when assigning them to developers. They also lack the understanding of dynamics of changing bug priorities. Thus, developers may spend considerable cognitive efforts moving between completely unrelated bug reports. To address these challenges, we proposed the following research contributions: 1. We proposed an automated approach to find and rank the potential classes and methods in order to localize software defects. Our approach finds a good balance between minimizing the number of recommended classes and maximizing the relevance of the proposed solution using a hybrid multi-objective optimization algorithm combining local and global search. Our hybrid multi-objective approach is able to successfully locate the true buggy methods within the top 10 recommendations for over 78% of the bug reports leading to a significant reduction of developers' effort comparing to class-level bug localization techniques. 2. We proposed an automated bugs triage approach based on the dependencies between several open bug reports. We defined the dependency between two bug reports as the number of common files to be inspected to localize the bugs. Then, we adopted multi-objective search to rank the bug reports for programmers. The results show a significant time reduction of over 30% in localizing the bugs simultaneously comparing to the traditional bugs prioritization technique based on only priorities. 3. We performed an empirical study to observe and understand the changes in bugs' priority in order to build a 3-W model on Why and When bug priorities change, and Who performs the change. We conducted interviews and a survey with practitioners as well as performed a quantitative analysis large database of bugs reports. As a result, we observed frequent changes in bug priorities and their impact on delaying critical bug fixes especially before shipping a new release.Ph.D.College of Engineering & Computer ScienceUniversity of Michigan-Dearbornhttp://deepblue.lib.umich.edu/bitstream/2027.42/170906/1/Rafi Almhana Final Dissertation.pdfDescription of Rafi Almhana Final Dissertation.pdf : Dissertatio
Automated identification and qualitative characterization of safety concerns reported in UAV software platforms
Unmanned Aerial Vehicles (UAVs) are nowadays used in a variety of applications. Given the cyber-physical nature of UAVs, software defects in these systems can cause issues with safety-critical implications. An important aspect of the lifecycle of UAV software is to minimize the possibility of harming humans or damaging properties through a continuous process of hazard identification and safety risk management. Specifically, safety-related concerns typically emerge during the operation of UAV systems, reported by end-users and developers in the form of issue reports and pull requests. However, popular UAV systems daily receive tens or hundreds of reports of varying types and quality. To help developers timely identifying and triaging safety-critical UAV issues, we (i) experiment with automated approaches (previously used for issue classification) for detecting the safety-related matters appearing in the titles and descriptions of issues and pull requests reported in UAV platforms, and (ii) propose a categorization of the main hazards and accidents discussed in such issues. Our results (i) show that shallow machine learning-based approaches can identify safety-related sentences with precision, recall, and F-measure values of about 80\%; and (ii) provide a categorization and description of the relationships between safety issue hazards and accidents