Search CORE

264 research outputs found

Statistical and deep learning models for software engineering corpora

Author: HOANG Van Duc Thong
Publication venue: Singapore Management University
Publication date: 01/08/2020
Field of study

Institutional Knowledge at Singapore Management University

Adaptive Cross-Project Bug Localization with Graph Learning

Author: Arumugam Venkatraman
Publication venue: 'University of Waterloo'
Publication date: 30/05/2022
Field of study

Bug localization is the process of identifying the source code files associated with a bug report. This is important because it allows developers to focus their efforts on fixing the bugs than finding the root cause of bugs in the first place. A number of different techniques have been developed for bug localization, but recent research has shown that supervised approaches using historical data are more effective than other methods. In reality, for the supervised approaches to work, these approaches need high quality and quantity of label-rich datasets. However, preparing training data for new projects and retraining the bug localization models can be highly expensive. Additionally, most of the projects do not have rich historic bug data, as pointed out by Zimmermann et al. This necessitates cross-project bug localization, which involves using data from one project to extract the transferable features to localize bugs in a new project. In this thesis, we aim to provide a bug localization model to locate buggy source code files in a new project without retraining by leveraging the transfer learning capability of deep learning models. Deep learning models can be trained once in a label-rich dataset and transferred to a new dataset. By leveraging deep learning, we propose AdaBL and AdaBL+GL, which can be trained once and transferred to a new project. The main idea behind AdaBL is to learn the syntactic and semantic relationship between bug reports and source code separately. The syntactic patterns are transferable features that exist between cross-projects. We pair AdaBL with a graph neural network to represent the source code as a graph to improve the semantic learning capability. We also performed a detailed survey to compile the bug localization research published since 2016 to examine the experimental settings practiced and the availability of the replication package of deep learning-based bug localization research

University of Waterloo's Institutional Repository

Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports

Author: Mukherjee Usmi
Rahman Mohammad Masudur
Publication venue
Publication date: 08/07/2023
Field of study

Software bug reports reported on bug-tracking systems often lack crucial information for the developers to promptly resolve them, costing companies billions of dollars. There has been significant research on effectively eliciting information from bug reporters in bug tracking systems using different templates that bug reporters need to use. However, the need for asking follow-up questions persists. Recent studies propose techniques to suggest these follow-up questions to help developers obtain the missing details, but there has been little research on answering these follow up questions, which are often unanswered. In this paper, we propose a novel approach that uses CodeT5 in combination with Lucene, an information retrieval technique that leverages the relevance of different bug reports, their components, and follow-up questions to recommend answers. These top-performing answers, along with their bug report, serve as additional context apart from the deficient bug report to the deep learning model for generating an answer. We evaluate our recommended answers with the manually annotated answers using similarity metrics like Normalized Smooth BLEU Score, METEOR, Word Mover's Distance, and Semantic Similarity. We achieve a BLEU Score of up to 34 and Semantic Similarity of up to 64 which shows that the answers generated are understandable and good according to Google's standard and can outperform multiple baselines.Comment: Fixed formatting and typographical error

arXiv.org e-Print Archive

Learning representations for effective and explainable software bug detection and fixing

Author: Li Yi
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2023
Field of study

Software has an integral role in modern life; hence software bugs, which undermine software quality and reliability, have substantial societal and economic implications. The advent of machine learning and deep learning in software engineering has led to major advances in bug detection and fixing approaches, yet they fall short of desired precision and recall. This shortfall arises from the absence of a \u27bridge,\u27 known as learning code representations, that can transform information from source code into a suitable representation for effective processing via machine and deep learning. This dissertation builds such a bridge. Specifically, it presents solutions for effectively learning code representations using four distinct methods?context-based, testing results-based, tree-based, and graph-based?thus improving bug detection and fixing approaches, as well as providing developers insight into the foundational reasoning. The experimental results demonstrate that using learning code representations can significantly enhance explainable bug detection and fixing, showcasing the practicability and meaningfulness of the approaches formulated in this dissertation toward improving software quality and reliability

Digital Commons @ New Jersey Institute of Technology (NJIT)

Automatically learning patterns for self-admitted technical debt removal

Author: Di Penta Massimiliano
Serebrenik Alexander
Zampetti Fiorella
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 06/02/2020
Field of study

Technical Debt (TD) expresses the need for improvements in a software system, e.g., to its source code or architecture. In certain circumstances, developers “self-admit” technical debt (SATD) in their source code comments. Previous studies investigate when SATD is admitted, and what changes developers perform to remove it. Building on these studies, we present a first step towards the automated recommendation of SATD removal strategies. By leveraging a curated dataset of SATD removal patterns, we build a multi-level classifier capable of recommending six SATD removal strategies, e.g., changing API calls, conditionals, method signatures, exception handling, return statements, or telling that a more complex change is needed. SARDELE (SAtd Removal using DEep LEarning) combines a convolutional neural network trained on embeddings extracted from the SATD comments with a recurrent neural network trained on embeddings extracted from the SATD-affected source code. Our evaluation reveals that SARDELE is able to predict the type of change to be applied with an average precision of ~55%, recall of ~57%, and AUC of 0.73, reaching up to 73% precision, 63% recall, and 0.74 AUC for certain categories such as changes to method calls. Overall, results suggest that SATD removal follows recurrent patterns and indicate the feasibility of supporting developers in this task with automated recommenders

Crossref

Pure OAI Repository