6 research outputs found

    Filtering overfitted automatically-generated patches by using automated test generation

    Get PDF
    "Generate-and-Validate'' (G&V) approaches to automatic program repair first generate candidate patches and then validate the patches against a test suite. Current G&V tools accept the first patch that passes all the test cases and are at risk of making a mistake: they can choose an ultimately buggy patch while leaving a correct patch unfound. These mistakes are often due to developer test suites being insufficient to correctly validate a patch. In our approach, we aim to improve existing test suites with automatic test generation. To circumvent the oracle problem, we compare the behavior of the buggy program with the behavior of the newly patched program, and if the patched program fails more, then the patch is considered to be overfitted and it is filtered. We evaluate our approach on 441 patches (both overfitted and correct) from three automatic repair systems and show that 67% (279/417) of overfitted patches are filtered. In addition, by further exploring the search space of patches for one of the defects, our approach filters overfitted patches until the correct patch is found and accepted. We also conduct a post-mortem relevance analysis of automatically-generated test cases to evaluate (1) how many of the test cases should aid the developer in manual debugging, (2) how many of the test cases should filter out more overfitted patches if better oracles are used, and (3) how many of the test cases are relevant to filtering overfitted patches (i.e., how many of them execute lines of code patched by SPR---one of the automatic repair systems in our study). The analysis shows that up to 40% of the automatically-generated test cases per bug can help developers conduct manual debugging and can filter out more overfitted patches if automatic repair systems are empowered with better oracles

    Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System

    Get PDF
    International audienceAmong the many different kinds of program repair techniques, one widely studied family of techniques is called test suite based repair. However, test suites are in essence input-output specifications and are thus typically inadequate for completely specifying the expected behavior of the program under repair. Consequently, the patches generated by test suite based repair techniques can just overfit to the used test suite, and fail to generalize to other tests. We deeply analyze the overfitting problem in program repair and give a classification of this problem. This classification will help the community to better understand and design techniques to defeat the overfitting problem. We further propose and evaluate an approach called UnsatGuided, which aims to alleviate the overfitting problem for synthesis-based repair techniques with automatic test case generation. The approach uses additional automatically generated tests to strengthen the repair constraint used by synthesis-based repair techniques. We analyze the effectiveness of UnsatGuided: 1) analytically with respect to alleviating two different kinds of overfitting issues; 2) empirically based on an experiment over the 224 bugs of the Defects4J repository. The main result is that automatic test generation is effective in alleviating one kind of overfitting issue–regression introduction, but due to oracle problem, has minimal positive impact on alleviating the other kind of overfitting issue–incomplete fixing

    Improving the Correctness of Automated Program Repair

    Get PDF
    Developers spend much of their time fixing bugs in software programs. Automated program repair (APR) techniques aim to alleviate the burden of bug fixing from developers by generating patches at the source-code level. Recently, Generate-and-Validate (G&V) APR techniques show great potential to repair general bugs in real-world applications. Recent evaluations show that G&V techniques repair 8–17.7% of the collected bugs from mature Java or C open-source projects. Despite the promising results, G&V techniques may generate many incorrect patches and are not able to repair every single bug. This thesis makes contributions to improve the correctness of APR by improving the quality assurance of the automatically-generated patches and generating more correct patches by leveraging human knowledge. First, this thesis investigates whether improving the test-suite-based validation can precisely identify incorrect patches that are generated by G&V, and whether it can help G&V generate more correct patches. The result of this investigation, Opad, which combines new fuzz-generated test cases and additional oracles (i.e., memory oracles), is proposed to identify incorrect patches and help G&V repair more bugs correctly. The evaluation of Opad shows that the improved test-suite-based validation identifies 75.2% incorrect patches from G&V techniques. With the integration of Opad, SPR, one of the most promising G&V techniques, repairs one additional bug. Second, this thesis proposes novel APR techniques to repair more bugs correctly, by leveraging human knowledge. Thus, APR techniques can repair new types of bugs that are not currently targeted by G&V APR techniques. Human knowledge in bug-fixing activities is noted in the forms such as commits of bug fixes, developers’ expertise, and documentation pages. Two techniques (APARE and Priv) are proposed to target two types of defects respectively: project-specific recurring bugs and vulnerability warnings by static analysis. APARE automatically learns fix patterns from historical bug fixes (i.e., originally crafted by developers), utilizes spectrum-based fault-localization technique to identify highly-likely faulty methods, and applies the learned fix patterns to generate patches for developers to review. The key innovation of APARE is to utilize a percentage semantic-aware matching algorithm between fix patterns and faulty locations. For the 20 recurring bugs, APARE generates 34 method fixes, 24 of which (70.6%) are correct; 83.3% (20 out of 24) are identical to the fixes generated by developers. In addition, APARE complements current repair systems by generating 20 high-quality method fixes that RSRepair and PAR cannot generate. Priv is a multi-stage remediation system specifically designed for static-analysis security-testing (SAST) techniques. The prototype is built and evaluated on a commercial SAST product. The first stage of Priv is to prioritize workloads of fixing vulnerability warnings based on shared fix locations. The likely fix locations are suggested based on a set of rules. The rules are concluded and developed through the collaboration with two security experts. The second stage of Priv provides additional essential information for improving the efficiency of diagnosis and fixing. Priv offers two types of additional information: identifying true database/attribute-related warnings, and providing customized fix suggestions per warning. The evaluation shows that Priv suggests identical fix locations to the ones suggested by developers for 50–100% of the evaluated vulnerability findings. Priv identifies up to 2170 actionable vulnerability findings for the evaluated six projects. The manual examination confirms that Priv can generate patches of high-quality for many of the evaluated vulnerability warnings
    corecore