4,397 research outputs found
Understanding the Impact of Diversity in Software Bugs on Bug Prediction Models
Nowadays, software systems are essential for businesses, users and society. At the same time such systems are growing both in complexity and size. In this context, developing high-quality software is a challenging and expensive activity for the software industry. Since software organizations are always limited by their budget, personnel and time, it is not a trivial task to allocate testing and code-review resources to areas that require the most attention. To overcome the above problem, researchers have developed software bug prediction models that can help practitioners to predict the most bug-prone software entities. Although, software bug prediction is a very popular research area, yet its industrial adoption remains limited.
In this thesis, we investigate three possible issues with the current state-of-the-art in software bug prediction that affect the practical usability of prediction models. First, we argue that current bug prediction models implicitly assume that all bugs are the same without taking into consideration their impact. We study the impact of bugs in terms of experience of the developers required to fix them. Second, only few studies investigate the impact of specific type of bugs. Therefore, we characterize a severe type of bug called Blocking bugs, and provide approaches to predict them early on. Third, false-negative files are buggy files that bug prediction models incorrectly as non-buggy files. We argue that a large number of false-negative files makes bug prediction models less attractive for developers. In our thesis, we quantify the extent of false-negative files, and manually inspect them in order to better understand their nature
Explaining Explanation: An Empirical Study on Explanation in Code Reviews
Code review is an important process for quality assurance in software
development. For an effective code review, the reviewers must explain their
feedback to enable the authors of the code change to act on them. However, the
explanation needs may differ among developers, who may require different types
of explanations. It is therefore crucial to understand what kind of
explanations reviewers usually use in code reviews. To the best of our
knowledge, no study published to date has analyzed the types of explanations
used in code review. In this study, we present the first analysis of
explanations in useful code reviews. We extracted a set of code reviews based
on their usefulness and labeled them based on whether they contained an
explanation, a solution, or both a proposed solution and an explanation
thereof.
Based on our analysis, we found that a significant portion of the code review
comments (46%) only include solutions without providing an explanation. We
further investigated the remaining 54% of code review comments containing an
explanation and conducted an open card sorting to categorize the reviewers'
explanations. We distilled seven distinct categories of explanations based on
the expression forms developers used. Then, we utilize large language models,
specifically ChatGPT, to assist developers in getting a code review explanation
that suits their preferences. Specifically, we created prompts to transform a
code review explanation into a specific type of explanation. Our evaluation
results show that ChatGPT correctly generated the specified type of explanation
in 88/90 cases and that 89/90 of the cases have the correct explanation.
Overall, our study provides insights into the types of explanations that
developers use in code review and showcases how ChatGPT can be leveraged during
the code review process to generate a specific type of explanation
Towards Automatic Identification of Violation Symptoms of Architecture Erosion
Architecture erosion has a detrimental effect on maintenance and evolution,
as the implementation drifts away from the intended architecture. To prevent
this, development teams need to understand early enough the symptoms of
erosion, and particularly violations of the intended architecture. One way to
achieve this, is through the automatic identification of architecture
violations from textual artifacts, and particularly code reviews. In this
paper, we developed 15 machine learning-based and 4 deep learning-based
classifiers with three pre-trained word embeddings to identify violation
symptoms of architecture erosion from developer discussions in code reviews.
Specifically, we looked at code review comments from four large open-source
projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator)
communities. We then conducted a survey to acquire feedback from the involved
participants who discussed architecture violations in code reviews, to validate
the usefulness of our trained classifiers. The results show that the SVM
classifier based on word2vec pre-trained word embedding performs the best with
an F1-score of 0.779. In most cases, classifiers with the fastText pre-trained
word embedding model can achieve relatively good performance. Furthermore,
200-dimensional pre-trained word embedding models outperform classifiers that
use 100 and 300-dimensional models. In addition, an ensemble classifier based
on the majority voting strategy can further enhance the classifier and
outperforms the individual classifiers. Finally, an online survey of the
involved developers reveals that the violation symptoms identified by our
approaches have practical value and can provide early warnings for impending
architecture erosion.Comment: 20 pages, 4 images, 7 tables, Revision submitted to TSE (2023
Contrastive Learning for API Aspect Analysis
We present a novel approach - CLAA - for API aspect detection in API reviews
that utilizes transformer models trained with a supervised contrastive loss
objective function. We evaluate CLAA using performance and impact analysis. For
performance analysis, we utilized a benchmark dataset on developer discussions
collected from Stack Overflow and compare the results to those obtained using
state-of-the-art transformer models. Our experiments show that contrastive
learning can significantly improve the performance of transformer models in
detecting aspects such as Performance, Security, Usability, and Documentation.
For impact analysis, we performed empirical and developer study. On a randomly
selected and manually labeled 200 online reviews, CLAA achieved 92% accuracy
while the SOTA baseline achieved 81.5%. According to our developer study
involving 10 participants, the use of 'Stack Overflow + CLAA' resulted in
increased accuracy and confidence during API selection. Replication package:
https://github.com/shahariar-shibli/Contrastive-Learning-for-API-Aspect-AnalysisComment: Accepted in the 38th IEEE/ACM International Conference on Automated
Software Engineering (ASE2023
Holistic recommender systems for software engineering
The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering
Nudge: Accelerating Overdue Pull Requests Towards Completion
Pull requests are a key part of the collaborative software development and
code review process today. However, pull requests can also slow down the
software development process when the reviewer(s) or the author do not actively
engage with the pull request. In this work, we design an end-to-end service,
Nudge, for accelerating overdue pull requests towards completion by reminding
the author or the reviewer(s) to engage with their overdue pull requests.
First, we use models based on effort estimation and machine learning to predict
the completion time for a given pull request. Second, we use activity detection
to reduce false positives. Lastly, we use dependency determination to
understand the blocker of the pull request and nudge the appropriate
actor(author or reviewer(s)). We also do a correlation analysis to understand
the statistical relationship between the pull request completion times and
various pull request and developer related attributes. Nudge has been deployed
on 147 repositories at Microsoft since 2019. We do a large scale evaluation
based on the implicit and explicit feedback we received from sending the Nudge
notifications on 8,500 pull requests. We observe significant reduction in
completion time, by over 60%, for pull requests which were nudged thus
increasing the efficiency of the code review process and accelerating the pull
request progression
- …