136 research outputs found

    The "Baggaging" of Theory-Based Evaluation

    Get PDF
    We introduce SeMu, a Dynamic Symbolic Execution technique that generates test inputs capable of killing stubborn mutants (killable mutants that remain undetected after a reasonable amount of testing). SeMu aims at mutant propagation (triggering erroneous states to the program output) by incrementally searching for divergent program behaviours between the original and the mutant versions. We model the mutant killing problem as a symbolic execution search within a specific area in the programs' symbolic tree. In this framework, the search area is defined and controlled by parameters that allow scalable and cost-effective mutant killing. We integrate SeMu in KLEE and experimented with Coreutils (a benchmark frequently used in symbolic execution studies). Our results show that our modelling plays an important role in mutant killing. Perhaps more importantly, our results also show that, within a two-hour time limit, SeMu kills 37% of the stubborn mutants, where KLEE kills none and where the mutant infection strategy (strategy suggested by previous research) kills 17%

    A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence

    Get PDF
    Though mutation testing has been widely studied for more than thirty years, the prevalence and properties of equivalent mutants remain largely unknown. We report on the causes and prevalence of equivalent mutants and their relationship to stubborn mutants (those that remain undetected by a high quality test suite, yet are non-equivalent). Our results, based on manual analysis of 1,230 mutants from 18 programs, reveal a highly uneven distribution of equivalence and stubbornness. For example, the ABS class and half UOI class generate many equivalent and almost no stubborn mutants, while the LCR class generates many stubborn and few equivalent mutants. We conclude that previous test effectiveness studies based on fault seeding could be skewed, while developers of mutation testing tools should prioritise those operators that we found generate disproportionately many stubborn (and few equivalent) mutants

    Mutation-inspired symbolic execution for software testing

    Get PDF
    Software testing is a complex and costly stage during the software development lifecycle. Nowadays, there is a wide variety of solutions to reduce testing costs and improve test quality. Focussing on test case generation, Dynamic Symbolic Execution (DSE) is used to generate tests with good structural coverage. Regarding test suite evaluation, Mutation Testing (MT) assesses the detection capability of the test cases by introducing minor localised changes that resemble real faults. DSE is however known to produce tests that do not have good mutation detection capabilities: in this paper, the authors set out to solve this by combining DSE and MT into a new family of approaches that the authors call Mutation-Inspired Symbolic Execution (MISE). First, this known result on a set of open source programs is confirmed: DSE by itself is not good at killing mutants, detecting only 59.9% out of all mutants. The authors show that a direct combination of DSE and MT (naive MISE) can produce better results, detecting up to 16% more mutants depending on the programme, though at a high computational cost. To reduce these costs, the authors set out a roadmap for more efficient versions of MISE, gaining its advantages while avoiding a large part of its additional costs

    Assessment and Improvement of the Practical Use of Mutation for Automated Software Testing

    Get PDF
    Software testing is the main quality assurance technique used in software engineering. In fact, companies that develop software and open-source communities alike actively integrate testing into their software development life cycle. In order to guide and give objectives for the software testing process, researchers have designed test adequacy criteria (TAC) which, define the properties of a software that must be covered in order to constitute a thorough test suite. Many TACs have been designed in the literature, among which, the widely used statement and branch TAC, as well as the fault-based TAC named mutation. It has been shown in the literature that mutation is effective at revealing fault in software, nevertheless, mutation adoption in practice is still lagging due to its cost. Ideally, TACs that are most likely to lead to higher fault revelation are desired for testing and, the fault-revelation of test suites is expected to increase as their coverage of TACs test objectives increase. However, the question of which TAC best guides software testing towards fault revelation remains controversial and open, and, the relationship between TACs test objectives’ coverage and fault-revelation remains unknown. In order to increase knowledge and provide answers about these issues, we conducted, in this dissertation, an empirical study that evaluates the relationship between test objectives’ coverage and fault-revelation for four TACs (statement, branch coverage and, weak and strong mutation). The study showed that fault-revelation increase with coverage only beyond some coverage threshold and, strong mutation TAC has highest fault revelation. Despite the benefit of higher fault-revelation that strong mutation TAC provide for software testing, software practitioners are still reluctant to integrate strong mutation into their software testing activities. This happens mainly because of the high cost of mutation analysis, which is related to the large number of mutants and the limitation in the automation of test generation for strong mutation. Several approaches have been proposed, in the literature, to tackle the analysis’ cost issue of strong mutation. Mutant selection (reduction) approaches aim to reduce the number of mutants used for testing by selecting a small subset of mutation operator to apply during mutants generation, thus, reducing the number of analyzed mutants. Nevertheless, those approaches are not more effective, w.r.t. fault-revelation, than random mutant sampling (which leads to a high loss in fault revelation). Moreover, there is not much work in the literature that regards cost-effective automated test generation for strong mutation. This dissertation proposes two techniques, FaRM and SEMu, to reduce the cost of mutation testing. FaRM statically selects and prioritizes mutants that lead to faults (fault-revealing mutants), in order to reduce the number of mutants (fault-revealing mutants represent a very small proportion of the generated mutants). SEMu automatically generates tests that strongly kill mutants and thus, increase the mutation score and improve the test suites. First, this dissertation makes an empirical study that evaluates the fault-revelation (ability to lead to tests that have high fault-revelation) of four TACs, namely statement, branch, weak mutation and strong mutation. The outcome of the study show evidence that for all four studied TACs, the fault-revelation increases with TAC test objectives’ coverage only beyond a certain threshold of coverage. This suggests the need to attain higher coverage during testing. Moreover, the study shows that strong mutation is the only studied TAC that leads to tests that have, significantly, the highest fault-revelation. Second, in line with mutant reduction, we study the different mutant quality indicators (used to qualify "useful" mutants) proposed in the literature, including fault-revealing mutants. Our study shows that there is a large disagreement between the indicators suggesting that the fault-revealing mutant set is unique and differs from other mutant sets. Thus, given that testing aims to reveal faults, one should directly target fault-revealing mutants for mutant reduction. We also do so in this dissertation. Third, this dissertation proposes FaRM, a mutant reduction technique based on supervised machine learning. In order to automatically discriminate, before test execution, between useful (valuable) and useless mutants, FaRM build a mutants classification machine learning model. The features for the classification model are static program features of mutants categorized as mutant types and mutant context (abstract syntax tree, control flow graph and data/control dependency information). FaRM’s classification model successfully predicted fault-revealing mutants and killable mutants. Then, in order to reduce the number of analyzed mutants, FaRM selects and prioritizes fault-revealing mutants based of the aforementioned mutants classification model. An empirical evaluation shows that FaRM outperforms (w.r.t. the accuracy of fault-revealing mutant selection) random mutants sampling and existing mutation operators-based mutant selection techniques. Fourth, this dissertation proposes SEMu, an automated test input generation technique aiming to increase strong mutation coverage score of test suites. SEMu is based on symbolic execution and leverages multiple cost reduction heuristics for the symbolic execution. An empirical evaluation shows that, for limited time budget, the SEMu generates tests that successfully increase strong mutation coverage score and, kill more mutants than test generated by state-of-the-art techniques. Finally, this dissertation proposes Muteria a framework that enables the integration of FaRM and SEMu into the automated software testing process. Overall, this dissertation provides insights on how to effectively use TACs to test software, shows that strong mutation is the most effective TAC for software testing. It also provides techniques that effectively facilitate the practical use of strong mutation and, an extensive tooling to support the proposed techniques while enabling their extensions for the practical adoption of strong mutation in software testing

    Fuzzing for CPS Mutation Testing

    Full text link
    Mutation testing can help reduce the risks of releasing faulty software. For such reason, it is a desired practice for the development of embedded software running in safety-critical cyber-physical systems (CPS). Unfortunately, state-of-the-art test data generation techniques for mutation testing of C and C++ software, two typical languages for CPS software, rely on symbolic execution, whose limitations often prevent its application (e.g., it cannot test black-box components). We propose a mutation testing approach that leverages fuzz testing, which has proved effective with C and C++ software. Fuzz testing automatically generates diverse test inputs that exercise program branches in a varied number of ways and, therefore, exercise statements in different program states, thus maximizing the likelihood of killing mutants, our objective. We performed an empirical assessment of our approach with software components used in satellite systems currently in orbit. Our empirical evaluation shows that mutation testing based on fuzz testing kills a significantly higher proportion of live mutants than symbolic execution (i.e., up to an additional 47 percentage points). Further, when symbolic execution cannot be applied, fuzz testing provides significant benefits (i.e., up to 41% mutants killed). Our study is the first one comparing fuzz testing and symbolic execution for mutation testing; our results provide guidance towards the development of fuzz testing tools dedicated to mutation testing.Comment: This article is the camera-ready version for ASE 202

    Técnicas de prueba avanzadas para la generación de casos de prueba

    Get PDF
    Software testing is a crucial phase in software development, particularly in contexts such as critical systems, where even minor errors can have severe consequences. The advent of Industry 4.0 brings new challenges, with software present in almost all industrial systems. Overcoming technical limitations, as well as limited development times and budgets, is a major challenge that software testing faces nowadays. Such limitations can result in insufficient attention being paid to it. The Bay of Cadiz’s industrial sector is known for its world-leading technological projects, with facilities and staff fully committed to innovation. The close relationship between these companies and the University of Cadiz allows for a constant exchange between industry and academia. This PhD thesis aims to identify the most important elements of software testing in Industry 4.0, based on close industrial experience and the latest state-of-the-art work. This allows us to break down the software testing process in a context where large teams work on large-scale, changing projects with numerous dependencies. It also allows us to estimate the percentage benefit that a solution could provide to test engineers throughout the process. Our results indicate a need for non-commercial, flexible, and adaptable solutions for the automation of software testing, capable of meeting the constantly changing needs of industry projects. This work provides a comprehensive study on the industry’s needs and motivates the development of two new solutions using state-of-the-art technologies, which are rarely present in industrial work. These results include a tool, ASkeleTon, which implements a procedure for generating test harnesses based on the Abstract Syntax Tree (AST) and a study examining the ability of the Dynamic Symbolic Execution (DSE) testing technique to generate test data capable of detecting potential faults in software. This study leads to the creation of a novel family of testing techniques, called mutationinspired symbolic execution (MISE), which combines DSE with mutation testing (MT) to produce test data capable of detecting more potential faults than DSE alone. The findings of this work can serve as a reference for future research on software testing in Industry 4.0. The solutions developed in this PhD thesis are able to automate essential tasks in software testing, resulting in significant potential benefits. These benefits are not only for the industry, but the creation of the new family of testing techniques also represents a promising line of research for the scientific community, benefiting all software projects regardless of their field of application.La prueba del software es una de las etapas más importantes durante el desarrollo de software, especialmente en determinados tipos de contextos como el de los sistemas críticos, donde el más mínimo fallo puede conllevar la más grave de las consecuencias. Nuevos paradigmas tecnológicos como la Industria 4.0 conllevan desafíos que nunca antes se habían planteado, donde el software está presente en prácticamente todos los sistemas industriales. Uno de los desafíos más importantes a los que se enfrenta la prueba del software consiste en superar las limitaciones técnicas además de los tiempos de desarrollo y presupuestos limitados, que provocan que en ocasiones no se le preste la atención que merece. El tejido industrial de la Bahía de Cádiz es conocido por sacar adelante proyectos tecnológicos punteros a nivel mundial, con unas instalaciones y un personal totalmente implicado con la innovación. Las buenas relaciones de este conjunto de empresas con la Universidad de Cádiz, sumadas a la cercanía geográfica, permiten que haya una conversación constante entre la industria y la academia. Este trabajo de tesis persigue identificar los elementos más importantes del desarrollo de la prueba del software en la Industria 4.0 en base a una experiencia industrial cercana, además de a los últimos trabajos del estado del arte. Esto permite identificar cada etapa en la que se desglosa la prueba del software en un contexto donde trabajan equipos muy grandes con proyectos de gran envergadura, cambiantes y con multitud de dependencias. Esto permite, además, estimar el porcentaje de beneficio que podría suponer una solución que ayude a los ingenieros de prueba durante todo el proceso. Gracias a los resultados de esta experiencia descubrimos que existe la necesidad de soluciones para la automatización de la prueba del software que sean no comerciales, flexibles y adaptables a las constantes necesidades cambiantes entre los proyectos de la industria. Este trabajo aporta un estudio completo sobre las necesidades de la industria en relación a la prueba del software. Los resultados motivan el desarrollo de dos nuevas soluciones que utilizan tecnologías del estado del arte, ampliamente usadas en trabajos académicos, pero raramente presentes en trabajos industriales. En este sentido, se presentan dos resultados principales que incluyen una herramienta que implementa un procedimiento para la generación de arneses de prueba basada en el Árbol de Sintaxis Abstracta (AST) a la que llamamos ASkeleTon y un estudio donde se comprueba la capacidad de la técnica de pruebas Ejecución Simbólica Dinámica (DSE, por sus siglas en inglés) para generar datos de prueba capaces de detectar fallos potenciales en el software. Este estudio deriva en la creación de una novedosa familia de técnicas de prueba a la que llamamos mutation-inspired symbolic execution (MISE) que combina DSE con la prueba de mutaciones (MT, por sus siglas en inglés) para conseguir un conjunto de datos de prueba capaz de detectar más fallos potenciales que DSE por sí sola. Las soluciones desarrolladas en este trabajo de tesis son capaces de automatizar parte de la prueba del software, resultando en unos beneficios potenciales importantes. No solo se aportan beneficios a la industria, sino que la creación de la nueva familia de técnicas de prueba supone una línea de investigación prometedora para la comunidad científica, siendo beneficiados todos los proyectos software independientemente de su ámbito de aplicación

    Mutant reduction based on dominance relation for weak mutation testing

    Get PDF
    Context: As a fault-based testing technique, mutation testing is effective at evaluating the quality of existing test suites. However, a large number of mutants result in the high computational cost in mutation testing. As a result, mutant reduction is of great importance to improve the efficiency of mutation testing. Objective: We aim to reduce mutants for weak mutation testing based on the dominance relation between mutant branches. Method: In our method, a new program is formed by inserting mutant branches into the original program. By analyzing the dominance relation between mutant branches in the new program, the non-dominated one is obtained, and the mutant corresponding to the non-dominated mutant branch is the mutant after reduction. Results: The proposed method is applied to test ten benchmark programs and six classes from open-source projects. The experimental results show that our method reduces over 80% mutants on average, which greatly improves the efficiency of mutation testing. Conclusion: We conclude that dominance relation between mutant branches is very important and useful in reducing mutants for mutation testing

    Mutation‐inspired symbolic execution for software testing

    Get PDF
    Software testing is a complex and costly stage during the software development lifecycle. Nowadays, there is a wide variety of solutions to reduce testing costs and improve test quality. Focussing on test case generation, Dynamic Symbolic Execution (DSE) is used to generate tests with good structural coverage. Regarding test suite evaluation, Mutation Testing (MT) assesses the detection capability of the test cases by introducing minor localised changes that resemble real faults. DSE is however known to produce tests that do not have good mutation detection capabilities: in this paper, the authors set out to solve this by combining DSE and MT into a new family of approaches that the authors call Mutation‐Inspired Symbolic Execution (MISE). First, this known result on a set of open source programs is confirmed: DSE by itself is not good at killing mutants, detecting only 59.9% out of all mutants. The authors show that a direct combination of DSE and MT (naive MISE) can produce better results, detecting up to 16% more mutants depending on the programme, though at a high computational cost. To reduce these costs, the authors set out a roadmap for more efficient versions of MISE, gaining its advantages while avoiding a large part of its additional costs

    Coverage-based quality metric of mutation operators for test suite improvement

    Get PDF
    The choice of mutation operators is a fundamental aspect in mutation testing to guide the tester to an effective test suite. Designing a set of mutation operators is subject to a trade-off between effectiveness and computational cost: a larger mutation population might uncover more faults, but will take longer to analyse. With the aim of resolving this trade-off, several authors have defined an assortment of metrics to determine the most valuable operators. In this work, we extend an existing quality metric by incorporating an additional source of data and coverage information and therefore investigate the extent to which mutants that are often covered but rarely killed can improve the evaluation of mutation operators for the refinement of the test suite. As a case study, we analyse C++ class-level operators based on the new coverage-based quality metric to assess whether the original metric is enhanced. The results when selecting the best-valued operators show that this metric has great potential to help the tester in finding effective mutation operators. In comparison with the metric from which it is derived, the use of coverage data allows to reduce the number of mutants but often loses fewer test cases and, in addition, retains those that seem hard to design

    Search-Based Mutant Selection for Efficient Test Suite Improvement: Evaluation and Results

    Get PDF
    Context: Search-based techniques have been applied to almost all areas in software engineering, especially to software testing, seeking to solve hard optimization problems. However, the problem of selecting mutants to improve the test suite at a lower cost has not been explored to the same extent as other problems, such as mutant selection for test suite evaluation or test data generation. Objective: In this paper, we apply search-based mutant selection to enhance the quality of test suites efficiently. Namely, we use the technique known as Evolutionary Mutation Testing (EMT), which allows reducing the number of mutants while preserving the power to refine the test suite. Despite reported benefits of its application, the existing empirical results were derived from a limited number of case studies, a particular set of mutation operators and a vague measure, which currently makes it difficult to determine the real performance of this technique. Method: This paper addresses the shortcomings of previous studies, providing a new methodology to evaluate EMT on the basis of the actual improvement of the test suite achieved by using the evolutionary strategy. We make use of that methodology in new experiments with a carefully selected set of real-world C++ case studies. Results: EMT shows a good performance for most case studies and levels of demand of test suite improvement (around 45% less mutants than random selection in the best case). The results reveal that even a reduced subset of mutants selected with EMT can serve to increase confidence in the test suite, especially in programs with a large set of mutants. Conclusions: These results support the use of search-based techniques to solve the problem of mutant selection for a more efficient test suite refinement. Additionally, we identify some aspects that could foreseeably help enhance EMT
    corecore