3,205 research outputs found
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems
Properly benchmarking Automated Program Repair (APR) systems should
contribute to the development and adoption of the research outputs by
practitioners. To that end, the research community must ensure that it reaches
significant milestones by reliably comparing state-of-the-art tools for a
better understanding of their strengths and weaknesses. In this work, we
identify and investigate a practical bias caused by the fault localization (FL)
step in a repair pipeline. We propose to highlight the different fault
localization configurations used in the literature, and their impact on APR
systems when applied to the Defects4J benchmark. Then, we explore the
performance variations that can be achieved by `tweaking' the FL step.
Eventually, we expect to create a new momentum for (1) full disclosure of APR
experimental procedures with respect to FL, (2) realistic expectations of
repairing bugs in Defects4J, as well as (3) reliable performance comparison
among the state-of-the-art APR systems, and against the baseline performance
results of our thoroughly assessed kPAR repair tool. Our main findings include:
(a) only a subset of Defects4J bugs can be currently localized by commonly-used
FL techniques; (b) current practice of comparing state-of-the-art APR systems
(i.e., counting the number of fixed bugs) is potentially misleading due to the
bias of FL configurations; and (c) APR authors do not properly qualify their
performance achievement with respect to the different tuning parameters
implemented in APR systems.Comment: Accepted by ICST 201
Neuro Symbolic Reasoning for Planning: Counterexample Guided Inductive Synthesis using Large Language Models and Satisfiability Solving
Generative large language models (LLMs) with instruct training such as GPT-4
can follow human-provided instruction prompts and generate human-like responses
to these prompts. Apart from natural language responses, they have also been
found to be effective at generating formal artifacts such as code, plans, and
logical specifications from natural language prompts. Despite their remarkably
improved accuracy, these models are still known to produce factually incorrect
or contextually inappropriate results despite their syntactic coherence - a
phenomenon often referred to as hallucination. This limitation makes it
difficult to use these models to synthesize formal artifacts that are used in
safety-critical applications. Unlike tasks such as text summarization and
question-answering, bugs in code, plan, and other formal artifacts produced by
LLMs can be catastrophic. We posit that we can use the satisfiability modulo
theory (SMT) solvers as deductive reasoning engines to analyze the generated
solutions from the LLMs, produce counterexamples when the solutions are
incorrect, and provide that feedback to the LLMs exploiting the dialog
capability of instruct-trained LLMs. This interaction between inductive LLMs
and deductive SMT solvers can iteratively steer the LLM to generate the correct
response. In our experiments, we use planning over the domain of blocks as our
synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo,
Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our
method allows the user to communicate the planning problem in natural language;
even the formulation of queries to SMT solvers is automatically generated from
natural language. Thus, the proposed technique can enable non-expert users to
describe their problems in natural language, and the combination of LLMs and
SMT solvers can produce provably correct solutions.Comment: 25 pages, 7 figure
ReFixar: Multi-version Reasoning for Automated Repair of Regression Errors
Software programs evolve naturally as part of the ever-changing customer needs and fast-paced market. Software evolution, however, often introduces regression bugs, which un-duly break previously working functionalities of the software. To repair regression bugs, one needs to know when and where a bug emerged from, e.g., the bug-inducing code changes, to narrow down the search space. Unfortunately, existing state-of-the-art automated program repair (APR) techniques have not yet fully exploited this information, rendering them less efficient and effective to navigate through a potentially large search space containing many plausible but incorrect solutions. In this work, we revisit APR on repairing regression errors in Java programs. We empirically show that existing state-of-the-art APR techniques do not perform well on regression bugs due to their algorithm design and lack of knowledge on bug inducing changes. We subsequently present ReFixar, a novel repair technique that leverages software evolution history to generate high quality patches for Java regression bugs. The key novelty that empowers ReFixar to more efficiently and effectively traverse the search space is two-fold: (1) A systematic way for multi-version reasoning to capture how a software evolves through its history, and (2) A novel search algorithm over a set of generic repair templates, derived from the principle of incorrectness logic and informed by both past bug fixes and their bug-inducing code changes; this enables ReFixar to achieve a balance of both genericity and specificity, i.e., generic common fix patterns of bugs and their specific contexts. We compare ReFixar against the state-of-the-art APR techniques on a data set of 51 real regression bugs from 28 large real-world programs. Experiments show that ReFixar significantly outperforms the best baseline by a large margin, i.e., ReFixar can fix correctly 24 bugs while the best baseline can only correctly fix 9 bugs
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Patching is a common activity in software development. It is generally
performed on a source code base to address bugs or add new functionalities. In
this context, given the recurrence of bugs across projects, the associated
similar patches can be leveraged to extract generic fix actions. While the
literature includes various approaches leveraging similarity among patches to
guide program repair, these approaches often do not yield fix patterns that are
tractable and reusable as actionable input to APR systems. In this paper, we
propose a systematic and automated approach to mining relevant and actionable
fix patterns based on an iterative clustering strategy applied to atomic
changes within patches. The goal of FixMiner is thus to infer separate and
reusable fix patterns that can be leveraged in other patch generation systems.
Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree
structure of the edit scripts that captures the AST-level context of the code
changes. FixMiner uses different tree representations of Rich Edit Scripts for
each round of clustering to identify similar changes. These are abstract syntax
trees, edit actions trees, and code context trees. We have evaluated FixMiner
on thousands of software patches collected from open source projects.
Preliminary results show that we are able to mine accurate patterns,
efficiently exploiting change information in Rich Edit Scripts. We further
integrated the mined patterns to an automated program repair prototype,
PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J
benchmark. Beyond this quantitative performance, we show that the mined fix
patterns are sufficiently relevant to produce patches with a high probability
of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
- …