28 research outputs found

    Mitigating the effect of coincidental correctness in spectrum based fault localization

    Get PDF
    2013 Summer.Includes bibliographical references.Coincidentally correct test cases are those that execute faulty program statements but do not result in failures. The presence of such test cases in a test suite reduces the effectiveness of spectrum-based fault localization approaches, such as Ochiai and Tarantula, which localize faulty statements by calculating a suspiciousness score for every program statement from test coverage information. The goal of this dissertation is to improve the understanding of how the presence of coincidentally correct test cases impacts the effectiveness of spectrum-based fault localization approaches and to develop a family of approaches that improve fault localization effectiveness by mitigating the effect of coincidentally correct test cases. Each approach (1)~classifies coincidentally correct test cases using test coverage information, and (2)~recalculates a suspiciousness score for every program statement using the classification information. We developed classification approaches using test coverage metrics at different levels of granularity, such as statement, branch, and function. We developed a new approach for ranking program statements using suspiciousness scores calculated based on the heuristic that the statements covered by more failing and coincidentally correct test cases are more suspicious. We extended the family of fault localization approaches to support multiple faults. We developed an approach to incorporate tester feedback to mitigate the effect of coincidental correctness. The approach analyzes tester feedback to determine a lower bound for the number of coincidentally correct test cases present in a test suite. The lower bound is also used to determine when classification of coincidentally correct test cases can improve fault localization effectiveness. We evaluated the fault localization effectiveness of our approaches and studied how the effectiveness changes for varying characteristics of test suites, such as size, test suite type (e.g., random, coverage adequate), and the percentage of passing test cases that are coincidentally correct. Our key findings are summarized as follows. Mitigating the effect of coincidentally correct test cases improved fault localization effectiveness. The extent of the improvement increased with an increase in the percentage of passing test cases that were coincidentally correct, although no improvement was observed when most passing test cases in a test suite were coincidentally correct. When random test suites were used to localize faults, a coarse-grained coverage spectrum, such as function coverage, resulted in better classification than fine-grained coverage spectra, such as statement and branch coverage. Utilizing tester feedback improved the precision of classification. Mitigating the effect of coincidental correctness in the presence of two faults improved the effectiveness for both faults simultaneously for most faulty programs. Faulty statements that were harder to reach and that affected fewer program statements resulted in fewer coincidentally correct test cases and were more effectively localized

    Diversifying focused testing for unit testing

    Get PDF
    Software changes constantly because developers add new features or modifications. This directly affects the effectiveness of the testsuite associated with that software, especially when these new modifications are in a specific area that no test case covers. This paper tackles the problem of generating a high quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search based software testing and constraint solving offer ready, but low quality, solutions to this: ideally a maximally diverse covering test set is required whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects

    Hashing fuzzing: introducing input diversity to improve crash detection

    Get PDF
    The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed program

    Hashing Fuzzing: Introducing Input Diversity to Improve Crash Detection

    Get PDF
    The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage -- up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers -- between 28% to 97% increases compared to test sets for untransformed programs

    Diversifying focused testing for unit testing

    Get PDF
    Software changes constantly because developers add new features or modifications. This directly affects the effectiveness of the testsuite associated with that software, especially when these new modifications are in a specific area that no test case covers. This paper tackles the problem of generating a high quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search based software testing and constraint solving offer ready, but low quality, solutions to this: ideally a maximally diverse covering test set is required whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects

    Oracle Assessment, Improvement and Placement

    Get PDF
    The oracle problem remains one of the key challenges in software testing, for which little automated support has been developed so far. This thesis analyses the prevalence of failed error propagation in programs with real faults to address the oracle placement problem and introduces an approach for iterative assessment and improvement of the oracles. To analyse failed error propagation in programs with real faults, we have conducted an empirical study, considering Defects4J, a benchmark of Java programs, of which we used all 6 projects available, 384 real bugs and 528 methods fixed to correct such bugs. The results indicate that the prevalence of failed error propagation is negligible. Moreover, the results on real faults differ from the results on mutants, indicating that if failed error propagation is taken into account, mutants are not a good surrogate of real faults. When measuring failed error propagation, for each method we use the strongest possible oracle as postcondition, which checks all externally observable program variables. The low prevalence of failed error propagation is caused by the presence of such a strong oracle, which usually is not available in practice. Therefore, there is a need for a technique to assess and improve existing weaker oracles. We propose a technique for assessing and improving test oracles, which necessarily places the human tester in the loop and is based on reducing the incidence of both false positives and false negatives. A proof showing that this approach results in an increase in the mutual information between the actual and perfect oracles is provided. The application of the approach to five real-world subjects shows that the fault detection rate of the oracles after improvement increases, on average, by 48.6%. The further evaluation with 39 participants assessed the ability of humans to detect false positives and false negatives manually, without any tool support. The correct classification rate achieved by humans in this case is poor (29%) indicating how helpful our automated approach can be for developers. The comparison of humans’ ability to improve oracles with and without the tool in a study with 29 other participants also empirically validates the effectiveness of the approach

    Hashing fuzzing: introducing input diversity to improve crash detection

    Get PDF
    The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed program
    corecore