10,040 research outputs found

    Identifying Patch Correctness in Test-Based Program Repair

    Full text link
    Test-based automatic program repair has attracted a lot of attention in recent years. However, the test suites in practice are often too weak to guarantee correctness and existing approaches often generate a large number of incorrect patches. To reduce the number of incorrect patches generated, we propose a novel approach that heuristically determines the correctness of the generated patches. The core idea is to exploit the behavior similarity of test case executions. The passing tests on original and patched programs are likely to behave similarly while the failing tests on original and patched programs are likely to behave differently. Also, if two tests exhibit similar runtime behavior, the two tests are likely to have the same test results. Based on these observations, we generate new test inputs to enhance the test suites and use their behavior similarity to determine patch correctness. Our approach is evaluated on a dataset consisting of 139 patches generated from existing program repair systems including jGenProg, Nopol, jKali, ACS and HDRepair. Our approach successfully prevented 56.3\% of the incorrect patches to be generated, without blocking any correct patches.Comment: ICSE 201

    DSpot: Test Amplification for Automatic Assessment of Computational Diversity

    Full text link
    Context: Computational diversity, i.e., the presence of a set of programs that all perform compatible services but that exhibit behavioral differences under certain conditions, is essential for fault tolerance and security. Objective: We aim at proposing an approach for automatically assessing the presence of computational diversity. In this work, computationally diverse variants are defined as (i) sharing the same API, (ii) behaving the same according to an input-output based specification (a test-suite) and (iii) exhibiting observable differences when they run outside the specified input space. Method: Our technique relies on test amplification. We propose source code transformations on test cases to explore the input domain and systematically sense the observation domain. We quantify computational diversity as the dissimilarity between observations on inputs that are outside the specified domain. Results: We run our experiments on 472 variants of 7 classes from open-source, large and thoroughly tested Java classes. Our test amplification multiplies by ten the number of input points in the test suite and is effective at detecting software diversity. Conclusion: The key insights of this study are: the systematic exploration of the observable output space of a class provides new insights about its degree of encapsulation; the behavioral diversity that we observe originates from areas of the code that are characterized by their flexibility (caching, checking, formatting, etc.).Comment: 12 page

    Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset

    Full text link
    Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J is provided with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic repair on Defects4J. The result of our experiment shows that 47 bugs of the Defects4J dataset can be automatically repaired by state-of- the-art repair. This sets a baseline for future research on automatic repair for Java. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly fixed with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial and incorrect patches still pass the test suite. With respect to practical applicability, it takes in average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All their systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair

    Generating Class-Level Integration Tests Using Call Site Information

    Get PDF
    Search-based approaches have been used in the literature to automate the process of creating unit test cases. However, related work has shown that generated unit-tests with high code coverage could be ineffective, i.e., they may not detect all faults or kill all injected mutants. In this paper, we propose CLING, an integration-level test case generation approach that exploits how a pair of classes, the caller and the callee, interact with each other through method calls. In particular, CLING generates integration-level test cases that maximize the Coupled Branches Criterion (CBC). Coupled branches are pairs of branches containing a branch of the caller and a branch of the callee such that an integration test that exercises the former also exercises the latter. CBC is a novel integration-level coverage criterion, measuring the degree to which a test suite exercises the interactions between a caller and its callee classes. We implemented CLING and evaluated the approach on 140 pairs of classes from five different open-source Java projects. Our results show that (1) CLING generates test suites with high CBC coverage, thanks to the definition of the test suite generation as a many-objectives problem where each couple of branches is an independent objective; (2) such generated suites trigger different class interactions and can kill on average 7.7% (with a maximum of 50%) of mutants that are not detected by tests generated at the unit level; (3) CLING can detect integration faults coming from wrong assumptions about the usage of the callee class (32 for our subject systems) that remain undetected when using automatically generated unit-level test suites

    An adaptation reference-point-based multiobjective evolutionary algorithm

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.It is well known that maintaining a good balance between convergence and diversity is crucial to the performance of multiobjective optimization algorithms (MOEAs). However, the Pareto front (PF) of multiobjective optimization problems (MOPs) affects the performance of MOEAs, especially reference point-based ones. This paper proposes a reference-point-based adaptive method to study the PF of MOPs according to the candidate solutions of the population. In addition, the proportion and angle function presented selects elites during environmental selection. Compared with five state-of-the-art MOEAs, the proposed algorithm shows highly competitive effectiveness on MOPs with six complex characteristics

    Finding The Lazy Programmer's Bugs

    Get PDF
    Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps biased by what they believe to be the current boundary conditions of the function being tested. Or at least, they were supposed to. A major step forward was the development of property testing. Property testing requires the user to write a few functional properties that are used to generate tests, and requires an external library or tool to create test data for the tests. As such many thousands of tests can be created for a single property. For the purely functional programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck and Lazy SmallCheck [RNL08]. Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may silently insert runtime exceptions for incomplete pattern matches. We attempt to automate the testing process using these implicit tests. Our contributions are in four main areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed to generate test data without requiring additional programmer work or annotations. (2) To combine the constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type of test data at its most general, in order to prevent committing too early to monomorphic types that cause needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make our test data generation algorithm more expressive. In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high coverage test suites and detect common programming errors in the process
    • …
    corecore