6,994 research outputs found

    Bug or Not? Bug Report Classification Using N-Gram IDF

    Get PDF
    Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201

    Analysis and Detection of Information Types of Open Source Software Issue Discussions

    Full text link
    Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering (ICSE2019

    SZZ Unleashed: An Open Implementation of the SZZ Algorithm -- Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project

    Full text link
    Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute

    Investigating model explanation of bug report assignment recommenders

    Get PDF
    Software projects receive a lot of bug reports, and each bug report needs to be triaged. An objective of the bug report triaging process is to find an appropriate developer who can fix the reported bug. As this process can be time-consuming and requires a lot of effort, researchers have implemented recommender systems using a variety of algorithms to automate this process. Although using these recommender systems has a number of benefits, there are still many obstacles to overcome. A key obstacle is that commonly used algorithms are black-box, making it difficult for practitioners to comprehend how the models make decisions. Lack of explainability results in a lack of trust and transparency in the recommendations. This work investigates approaches that lead to visually explainable bug report assignment recommender systems. First, we developed and compared six different recommender systems using three distinct machine learning algorithms: Random Forest (RF), MLP Classifier and Bidirectional Neural Networks (BNN) and two different feature extraction techniques: TF-IDF and Word2Vec. Second, we examine the use of WordNet to improve recommender accuracy. Third, we explore the explanation of a bug report assignment recommender using the feature-based local model LIME. Finally, we assess the use of a positivenegative horizontal bar chart, feature table, and word cloud to explain the recommender systems visually. Our analytical analysis indicates that the optimum approach for developing a bug report assignment recommender system uses TF-IDF with RF and visually explains the recommendation with a word cloud and LIME as a local model
    • …
    corecore