Search CORE

1,676 research outputs found

tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models

Author: Robert B. Gramacy
Publication venue
Publication date
Field of study

The tgp package for R is a tool for fully Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes with jumps to the limiting linear model. Special cases also implemented include Bayesian linear models, linear CART, stationary separable and isotropic Gaussian processes. In addition to inference and posterior prediction, the package supports the (sequential) design of experiments under these models paired with several objective criteria. 1-d and 2-d plotting, with higher dimension projection and slice capabilities, and tree drawing functions (requiring maptree and combinat packages), are also provided for visualization of tgp objects.

Research Papers in Economics

Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

Author: Ahmadvand Hossein
Hemmati Hadi
Mashhadi Ehsan
Publication venue
Publication date: 06/09/2023
Field of study

In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher-severity ones get immediate attention. In this study, we investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels of two prominent datasets. We leverage several source metrics at method-level granularity to train eight different machine-learning models. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. We then use the pre-trained CodeBERT LLM to study the source code representations' effectiveness in predicting bug severity. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics, compared to the best classic prediction model on source code metric. Finally, we integrate source code metrics into CodeBERT as an additional input, using our two proposed architectures, which both enhance the CodeBERT model effectiveness

arXiv.org e-Print Archive

Mining Action Rules for Defect Reduction Planning

Author: Khomh Foutse
Laberge Gabriel
Lamothe Maxime
Oueslati Khouloud
Publication venue
Publication date: 22/05/2024
Field of study

Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black box machine learning model and "explaining" its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT's explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations

arXiv.org e-Print Archive

A Comparative Study of Contemporary Learning Paradigms in Bug Report Priority Detection

Author: Koksal Omer
Toroslu İsmail Hakkı
Yilmaz Eyup Halit
Publication venue
Publication date: 01/01/2024
Field of study

The increasing complexity of software development demands efficient automated bug report priority classification, and recent advancements in deep learning hold promise. This paper presents a comparative study of contemporary learning paradigms, including BERT, vector databases, large language models (LLMs), and a simple novel learning paradigm, contrastive learning for BERT. Utilizing datasets from bug reports, movie reviews, and app reviews, we evaluate and compare the performance of each approach. We find that transformer encoder-only models outperform in classification tasks measured by the precision, recall, and F1 score transformer decoder-only models despite an order of magnitude gap between the number of parameters. The novel use of contrastive learning for BERT demonstrates promising results in capturing subtle nuances in text data. This work highlights the potential of advanced NLP techniques for automated bug report priority classification and underscores the importance of considering multiple factors when developing models for this task. The paper’s main contributions are a comprehensive evaluation of various learning paradigms, such as vector databases and LLMs, an introduction of contrastive learning for BERT, an exploration of applicability to other text classification tasks, and a contrastive learning procedure that exploits ordinal information between classes

Middle East Technical University Research Information System

OpenMETU (Middle East Technical University)

Explainable Automated Debugging via Large Language Model-driven Scientific Debugging

Author: Chen Bei
Kang Sungmin
Lou Jian-Guang
Yoo Shin
Publication venue
Publication date: 04/04/2023
Field of study

Automated debugging techniques have the potential to reduce developer effort in debugging, and have matured enough to be adopted by industry. However, one critical issue with existing techniques is that, while developers want rationales for the provided automatic debugging results, existing techniques are ill-suited to provide them, as their deduction process differs significantly from that of human developers. Inspired by the way developers interact with code when debugging, we propose Automated Scientific Debugging (AutoSD), a technique that given buggy code and a bug-revealing test, prompts large language models to automatically generate hypotheses, uses debuggers to actively interact with buggy code, and thus automatically reach conclusions prior to patch generation. By aligning the reasoning of automated debugging more closely with that of human developers, we aim to produce intelligible explanations of how a specific patch has been generated, with the hope that the explanation will lead to more efficient and accurate developer decisions. Our empirical analysis on three program repair benchmarks shows that AutoSD performs competitively with other program repair baselines, and that it can indicate when it is confident in its results. Furthermore, we perform a human study with 20 participants, including six professional developers, to evaluate the utility of explanations from AutoSD. Participants with access to explanations could judge patch correctness in roughly the same time as those without, but their accuracy improved for five out of six real-world bugs studied: 70% of participants answered that they wanted explanations when using repair tools, while 55% answered that they were satisfied with the Scientific Debugging presentation

arXiv.org e-Print Archive

LHC Dark Matter Signals from Vector Resonances and Top Partners

Author: Belyaev Alexander S.
Flacke Thomas
Jain Bithika
Schaefers Patrick B.
Publication venue: 'American Physical Society (APS)'
Publication date: 21/07/2017
Field of study

Extensions of the Standard Model which address the hierarchy problem and dark matter (DM) often contain top partners and additional resonances at the TeV scale. We explore the phenomenology of a simplified effective model with a vector resonance

Z'

, a fermionic vector-like coloured partner of the top quark

T'

as well as a scalar DM candidate

\phi

and provide publicly available implementations in CalcHEP and MadGraph. We study the

pp \to Z' \to T'\overline{T'} \to t\bar{t}\,\phi\phi

process at the LHC and find that it plays an important role in addition to the

T'\overline{T'}

production via strong interactions. It turns out that the presence of the

Z'

can provide a dominant contribution to the

t\bar{t}+E_T^{\text{miss}}

signature without conflicting with existing bounds from

Z'

searches in di-jet and di-lepton final states. We find that through this process, the LHC is already probing DM masses up to about 900 GeV and top partner masses up to about 1.5 TeV, thus exceeding the current bounds from QCD production alone almost by a factor of two for both particles.Comment: 32 pages, 15 figures, 3 table

Repositório Institucional UNESP

arXiv.org e-Print Archive

IBS Publications Repository

Southampton (e-Prints Soton)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Deep Dive into Large Language Models for Automated Bug Localization and Repair

Author: Chiang Wen-Hao
Hossain Soneya Binta
Jiang Nan
Li Xiaopeng
Lyu Yingjun
Nguyen Hoan
Tripp Omer
Zhou Qiang
Publication venue
Publication date: 10/05/2024
Field of study

Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our approach uniquely employs LLMs to predict bug location at the token level and subsequently utilizes them for bug fixing. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information and improved incorporation of inductive biases. We introduce Toggle: Token-Granulated Bug Localization and Repair, a comprehensive program repair framework that integrates a bug localization model, an adjustment unit, and a bug-fixing model. Toggle takes a buggy function as input and generates a complete corrected function. We investigate various styles of prompting to the bug fixing model to identify the most effective prompts that better utilize the inductive bias and significantly outperform others. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark, and exhibits better and comparable performance on several other widely-used APR datasets, including Defects4J

arXiv.org e-Print Archive

A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization

Author: An Gabin
Kang Sungmin
Yoo Shin
Publication venue
Publication date: 02/07/2024
Field of study

Fault Localization (FL), in which a developer seeks to identify which part of the code is malfunctioning and needs to be fixed, is a recurring challenge in debugging. To reduce developer burden, many automated FL techniques have been proposed. However, prior work has noted that existing techniques fail to provide rationales for the suggested locations, hindering developer adoption of these techniques. With this in mind, we propose AutoFL, a Large Language Model (LLM)-based FL technique that generates an explanation of the bug along with a suggested fault location. AutoFL prompts an LLM to use function calls to navigate a repository, so that it can effectively localize faults over a large software repository and overcome the limit of the LLM context length. Extensive experiments on 798 real-world bugs in Java and Python reveal AutoFL improves method-level acc@1 by up to 233.3% over baselines. Furthermore, developers were interviewed on their impression of AutoFL-generated explanations, showing that developers generally liked the natural language explanations of AutoFL, and that they preferred reading a few, high-quality explanations instead of many.Comment: Accepted to ACM International Conference on the Foundations of Software Engineering (FSE 2024

arXiv.org e-Print Archive

A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

Author: Alam Mirza Sanjida
Barr Earl T.
Le Wei
Rahman Md Mahbubur
Roy Monoshi Kumar
Steenhoek Benjamin
Publication venue
Publication date: 25/03/2024
Field of study

Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabilities. Although recent work has applied LLMs to vulnerability detection using generic prompting techniques, their full capabilities for this task and the types of errors they make when explaining identified vulnerabilities remain unclear. In this paper, we surveyed eleven LLMs that are state-of-the-art in code generation and commonly used as coding assistants, and evaluated their capabilities for vulnerability detection. We systematically searched for the best-performing prompts, incorporating techniques such as in-context learning and chain-of-thought, and proposed three of our own prompting methods. Our results show that while our prompting methods improved the models' performance, LLMs generally struggled with vulnerability detection. They reported 0.5-0.63 Balanced Accuracy and failed to distinguish between buggy and fixed versions of programs in 76% of cases on average. By comprehensively analyzing and categorizing 287 instances of model reasoning, we found that 57% of LLM responses contained errors, and the models frequently predicted incorrect locations of buggy code and misidentified bug types. LLMs only correctly localized 6 out of 27 bugs in DbgBench, and these 6 bugs were predicted correctly by 70-100% of human participants. These findings suggest that despite their potential for other tasks, LLMs may fail to properly comprehend critical code structures and security-related concepts. Our data and code are available at https://figshare.com/s/78fe02e56e09ec49300b

arXiv.org e-Print Archive

BugBlitz-AI: An Intelligent QA Assistant

Author: Chen Jack
Gai Xuming
Hu Yabai
Liu Wenjun
Wang Jun
Wang Lifeng
Wang Zhenming
Yao Yi
Zhou Yi
Publication venue
Publication date: 17/05/2024
Field of study

The evolution of software testing from manual to automated methods has significantly influenced quality assurance (QA) practices. However, challenges persist in post-execution phases, particularly in result analysis and reporting. Traditional post-execution validation phases require manual intervention for result analysis and report generation, leading to inefficiencies and potential development cycle delays. This paper introduces BugBlitz-AI, an AI-powered validation toolkit designed to enhance end-to-end test automation by automating result analysis and bug reporting processes. BugBlitz-AI leverages recent advancements in artificial intelligence to reduce the time-intensive tasks of manual result analysis and report generation, allowing QA teams to focus more on crucial aspects of product quality. By adopting BugBlitz-AI, organizations can advance automated testing practices and integrate AI into QA processes, ensuring higher product quality and faster time-to-market. The paper outlines BugBlitz-AI's architecture, discusses related work, details its quality enhancement strategies, and presents results demonstrating its effectiveness in real-world scenarios

arXiv.org e-Print Archive