Search CORE

28 research outputs found

Mitigating the effect of coincidental correctness in spectrum based fault localization

Author: Bandyopadhyay Aritra
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2013
Field of study

2013 Summer.Includes bibliographical references.Coincidentally correct test cases are those that execute faulty program statements but do not result in failures. The presence of such test cases in a test suite reduces the effectiveness of spectrum-based fault localization approaches, such as Ochiai and Tarantula, which localize faulty statements by calculating a suspiciousness score for every program statement from test coverage information. The goal of this dissertation is to improve the understanding of how the presence of coincidentally correct test cases impacts the effectiveness of spectrum-based fault localization approaches and to develop a family of approaches that improve fault localization effectiveness by mitigating the effect of coincidentally correct test cases. Each approach (1)~classifies coincidentally correct test cases using test coverage information, and (2)~recalculates a suspiciousness score for every program statement using the classification information. We developed classification approaches using test coverage metrics at different levels of granularity, such as statement, branch, and function. We developed a new approach for ranking program statements using suspiciousness scores calculated based on the heuristic that the statements covered by more failing and coincidentally correct test cases are more suspicious. We extended the family of fault localization approaches to support multiple faults. We developed an approach to incorporate tester feedback to mitigate the effect of coincidental correctness. The approach analyzes tester feedback to determine a lower bound for the number of coincidentally correct test cases present in a test suite. The lower bound is also used to determine when classification of coincidentally correct test cases can improve fault localization effectiveness. We evaluated the fault localization effectiveness of our approaches and studied how the effectiveness changes for varying characteristics of test suites, such as size, test suite type (e.g., random, coverage adequate), and the percentage of passing test cases that are coincidentally correct. Our key findings are summarized as follows. Mitigating the effect of coincidentally correct test cases improved fault localization effectiveness. The extent of the improvement increased with an increase in the percentage of passing test cases that were coincidentally correct, although no improvement was observed when most passing test cases in a test suite were coincidentally correct. When random test suites were used to localize faults, a coarse-grained coverage spectrum, such as function coverage, resulted in better classification than fine-grained coverage spectra, such as statement and branch coverage. Utilizing tester feedback improved the precision of classification. Mitigating the effect of coincidental correctness in the presence of two faults improved the effectiveness for both faults simultaneously for most faulty programs. Faulty statements that were harder to reach and that affected fewer program statements resulted in fewer coincidentally correct test cases and were more effectively localized

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Diversifying focused testing for unit testing

Author: Clark D.
Clark D.
Jahangirova G.
Jahangirova G.
Menéndez H.
Menéndez H.
Sarro F.
Sarro F.
Tonella P.
Tonella P.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/01/2021
Field of study

Software changes constantly because developers add new features or modifications. This directly affects the effectiveness of the testsuite associated with that software, especially when these new modifications are in a specific area that no test case covers. This paper tackles the problem of generating a high quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search based software testing and constraint solving offer ready, but low quality, solutions to this: ideally a maximally diverse covering test set is required whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects

Middlesex University Research Repository

Hashing fuzzing: introducing input diversity to improve crash detection

Author: Clark D.
Clark D.
Menéndez H.
Menéndez H.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2022
Field of study

The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed program

Middlesex University Research Repository

Hashing Fuzzing: Introducing Input Diversity to Improve Crash Detection

Author: Clark D
Menendez HD
Publication venue
Publication date: 01/07/2021
Field of study

The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage -- up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers -- between 28% to 97% increases compared to test sets for untransformed programs

UCL Discovery

Recommended from our members

The interlocutory tool box: techniques for curtailing coincidental correctness

Author: Patel Krishna
Publication venue: Brunel University London
Publication date: 01/01/2017
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonEliminating faults in software systems is important, because they can have catastrophic consequences. This can be achieved by testing and debugging. Testing involves executing the system with a test case to obtain an output. The output is evaluated against the tester’s expectations; deviation from these expectations indicates that a fault has been detected. Debugging involves using information about the fault, that was gleaned during testing, to isolate the fault in the system. Coincidental correctness is a widespread phenomenon in which a fault corrupts a program state, and despite this, the system produces an output that satisfies the tester’s expectations. Coincidental correctness can compromise the effectiveness of testing and debugging techniques. This thesis investigated methods for alleviating coincidental correctness in testing and debugging. The investigation culminated in four techniques. The first technique is called Interlocutory Testing. Interlocutory Testing is a framework for the development of test oracles that are referred to as Interlocutory Relations. Interlocutory Relations are the first type of oracle that has been specifically designed to operate effectively in the presence of coincidental correctness. Metamorphic Testing was pioneered for testing non-testable systems. However, the effectiveness of this technique can be compromised by coincidental correctness. The second technique, Interlocutory Metamorphic Testing, is a version of Metamorphic Testing that has been integrated with Interlocutory Testing, to alleviate the impact of coincidental correctness on Metamorphic Testing. Interlocutory Mutation Testing is the third technique. This technique uses similar principles to Interlocutory Testing to alleviate the Equivalent Mutant Problem in the presence of coincidental correctness and non-determinism. Finally, the fourth technique is Interlocutory Spectrum-based Fault Localisation. This technique uses Interlocutory Relations to ameliorate the effects of coincidental correctness on fault localisation. Each technique was empirically evaluated. The results were promising, and indicated that these techniques were capable of mitigating the impact of coincidental correctness

Brunel University Research Archive

Diversifying focused testing for unit testing

Author: Barrett Clark
Chakraborty Supratik
Chakraborty Supratik
Chakraborty Supratik
Cover Thomas M.
Feldt R.
Goldreich Oded
Gomes Carla P.
Hollander Myles
Jia Y.
Michelle
Pacheco Carlos
Rubinstein Reuven Y.
Steele Mike
Tan Shin Hwei
Team R Core
Zitzler Eckart
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2021
Field of study

Crossref

UCL Discovery

Middlesex University Research Repository

Oracle Assessment, Improvement and Placement

Author: Jahangirova Gunel
Publication venue: UCL (University College London)
Publication date: 28/04/2019
Field of study

The oracle problem remains one of the key challenges in software testing, for which little automated support has been developed so far. This thesis analyses the prevalence of failed error propagation in programs with real faults to address the oracle placement problem and introduces an approach for iterative assessment and improvement of the oracles. To analyse failed error propagation in programs with real faults, we have conducted an empirical study, considering Defects4J, a benchmark of Java programs, of which we used all 6 projects available, 384 real bugs and 528 methods fixed to correct such bugs. The results indicate that the prevalence of failed error propagation is negligible. Moreover, the results on real faults differ from the results on mutants, indicating that if failed error propagation is taken into account, mutants are not a good surrogate of real faults. When measuring failed error propagation, for each method we use the strongest possible oracle as postcondition, which checks all externally observable program variables. The low prevalence of failed error propagation is caused by the presence of such a strong oracle, which usually is not available in practice. Therefore, there is a need for a technique to assess and improve existing weaker oracles. We propose a technique for assessing and improving test oracles, which necessarily places the human tester in the loop and is based on reducing the incidence of both false positives and false negatives. A proof showing that this approach results in an increase in the mutual information between the actual and perfect oracles is provided. The application of the approach to five real-world subjects shows that the fault detection rate of the oracles after improvement increases, on average, by 48.6%. The further evaluation with 39 participants assessed the ability of humans to detect false positives and false negatives manually, without any tool support. The correct classification rate achieved by humans in this case is poor (29%) indicating how helpful our automated approach can be for developers. The comparison of humans’ ability to improve oracles with and without the tool in a study with 29 other participants also empirically validates the effectiveness of the approach

UCL Discovery

Hashing fuzzing: introducing input diversity to improve crash detection

Author: Clark David
Menéndez Héctor D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/07/2021
Field of study

Middlesex University Research Repository

King's Research Portal