3 research outputs found
Test-Equivalence Analysis for Automatic Patch Generation
Automated program repair is a problem of finding a transformation (called a patch) of a given incorrect program that eliminates the observable failures. It has important applications such as providing debugging aids, automatically grading student assignments, and patching security vulnerabilities. A common challenge faced by existing repair techniques is scalability to large patch spaces, since there are many candidate patches that these techniques explicitly or implicitly consider.
The correctness criteria for program repair is often given as a suite of tests. Current repair techniques do not scale due to the large number of test executions performed by the underlying search algorithms. In this work, we address this problem by introducing a methodology of patch generation based on a test-equivalence relation (if two programs are “test-equivalent” for a given test, they produce indistinguishable results on this test). We propose two test-equivalence relations based on runtime values and dependencies, respectively, and present an algorithm that performs on-the-fly partitioning of patches into test-equivalence classes.
Our experiments on real-world programs reveal that the proposed methodology drastically reduces the number of test executions and therefore provides an order of magnitude efficiency improvement over existing repair techniques, without sacrificing patch quality
Re-factoring based program repair applied to programming assignments
Automated program repair has been used to provide feedback for incorrect student programming assignments, since program repair captures the code modification needed to make a given buggy program pass a given test-suite. Existing student feedback generation techniques are limited because they either require manual effort in the form of providing an error model, or require a large number of correct student submissions to learn from, or suffer from lack of scalability and accuracy. In this work, we propose a fully automated approach for generating student program repairs in real-time. This is achieved by first re-factoring all available correct solutions to semantically equivalent solutions. Given an incorrect program, we match the program with the closest matching refactored program based on its control flow structure. Subsequently, we infer the input-output specifications of the incorrect program's basic blocks from the executions of the correct program's aligned basic blocks. Finally, these specifications are used to modify the blocks of the incorrect program via search-based synthesis. Our dataset consists of almost 1,800 real-life incorrect Python program submissions from 361 students for an introductory programming course at a large public university. Our experimental results suggest that our method is more effective and efficient than recently proposed feedback generation approaches. About 30% of the patches produced by our tool Refactory are smaller than those produced by the state-of-art tool Clara, and can be produced given fewer correct solutions (often a single correct solution) and in a shorter time. We opine that our method is applicable not only to programming assignments, and could be seen as a general-purpose program repair method that can achieve good results with just a single correct reference solution
SelfAPR: Self-supervised Program Repair with Test Execution Diagnostics
Learning-based program repair has achieved good results in a recent series of
papers. Yet, we observe that the related work fails to repair some bugs because
of a lack of knowledge about 1) the application domain of the program being
repaired, and 2) the fault type being repaired. In this paper, we solve both
problems by changing the learning paradigm from supervised training to
self-supervised training in an approach called SelfAPR. First, SelfAPR
generates training samples on disk by perturbing a previous version of the
program being repaired, enforcing the neural model to capture projectspecific
knowledge. This is different from the previous work based on mined past
commits. Second, SelfAPR executes all training samples and extracts and encodes
test execution diagnostics into the input representation, steering the neural
model to fix the kind of fault. This is different from the existing studies
that only consider static source code as input. We implement SelfAPR and
evaluate it in a systematic manner. We generate 1 039 873 training samples
obtained by perturbing 17 open-source projects. We evaluate SelfAPR on 818 bugs
from Defects4J, SelfAPR correctly repairs 110 of them, outperforming all the
supervised learning repair approaches