Search CORE

71 research outputs found

Normalised squeeziness and failed error propagation

Author: Clark D.
Hierons R.
Patel K.
Publication venue: Elsevier
Publication date: 10/05/2019
Field of study

Failed Error Propagation (FEP) can reduce test effectiveness and recent work proposed an information theoretic measure, Squeeziness, as the theoretical basis for avoiding FEP. This paper demonstrates that Squeeziness is not suitable for comparing programs with different input domains. We then extend Squeeziness to Normalised Squeeziness and demonstrate that this is more generally useful

UCL Discovery

White Rose Research Online

FigShare

An information theoretic notion of software testability

Author: Clark D
Hierons RM
Patel K
Publication venue: 'Elsevier BV'
Publication date: 01/03/2022
Field of study

CONTEXT: In software testing, Failed Error Propagation (FEP) is the situation in which a faulty program state occurs during the execution of the system under test (SUT) but this does not lead to incorrect output. It is known that FEP can adversely affect software testing and this has resulted in associated information theoretic measures. OBJECTIVE: To devise measures that can be used to assess the testability of the SUT. By testability, we mean how likely it is that a faulty program state, that occurs during testing, will lead to incorrect output. Previous work has considered a single program point rather than an entire program. METHOD: New, more fine-grained, measures were devised. Experiments were used to evaluate these and the previously defined measures (Squeeziness and Normalised Squeeziness). The experiments assessed how well these measures correlated with an estimate of the probability of FEP occurring during testing. Mutants were used to estimate this probability. RESULTS: A strong rank correlation was found between several of the measures and the probability of FEP. Importantly, this included the Normalised Squeeziness of the whole SUT, which is simpler to compute, or estimate, than most of the other measures considered. Additional experiments found that the measures were relatively insensitive to the choice of mutants and also test suite. CONCLUSION: There is scope to use information theoretic measures to estimate how prone an SUT is to FEP. As a result, there is potential to use such measures to prioritise testing or estimate how much testing an SUT might require

UCL Discovery

Diversifying focused testing for unit testing

Author: Clark D.
Clark D.
Jahangirova G.
Jahangirova G.
Menéndez H.
Menéndez H.
Sarro F.
Sarro F.
Tonella P.
Tonella P.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/01/2021
Field of study

Software changes constantly because developers add new features or modifications. This directly affects the effectiveness of the testsuite associated with that software, especially when these new modifications are in a specific area that no test case covers. This paper tackles the problem of generating a high quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search based software testing and constraint solving offer ready, but low quality, solutions to this: ideally a maximally diverse covering test set is required whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects

Middlesex University Research Repository

Oracle Assessment, Improvement and Placement

Author: Jahangirova Gunel
Publication venue: UCL (University College London)
Publication date: 28/04/2019
Field of study

The oracle problem remains one of the key challenges in software testing, for which little automated support has been developed so far. This thesis analyses the prevalence of failed error propagation in programs with real faults to address the oracle placement problem and introduces an approach for iterative assessment and improvement of the oracles. To analyse failed error propagation in programs with real faults, we have conducted an empirical study, considering Defects4J, a benchmark of Java programs, of which we used all 6 projects available, 384 real bugs and 528 methods fixed to correct such bugs. The results indicate that the prevalence of failed error propagation is negligible. Moreover, the results on real faults differ from the results on mutants, indicating that if failed error propagation is taken into account, mutants are not a good surrogate of real faults. When measuring failed error propagation, for each method we use the strongest possible oracle as postcondition, which checks all externally observable program variables. The low prevalence of failed error propagation is caused by the presence of such a strong oracle, which usually is not available in practice. Therefore, there is a need for a technique to assess and improve existing weaker oracles. We propose a technique for assessing and improving test oracles, which necessarily places the human tester in the loop and is based on reducing the incidence of both false positives and false negatives. A proof showing that this approach results in an increase in the mutual information between the actual and perfect oracles is provided. The application of the approach to five real-world subjects shows that the fault detection rate of the oracles after improvement increases, on average, by 48.6%. The further evaluation with 39 participants assessed the ability of humans to detect false positives and false negatives manually, without any tool support. The correct classification rate achieved by humans in this case is poor (29%) indicating how helpful our automated approach can be for developers. The comparison of humans’ ability to improve oracles with and without the tool in a study with 29 other participants also empirically validates the effectiveness of the approach

UCL Discovery