24 research outputs found

    Fuzzing for CPS Mutation Testing

    Full text link
    Mutation testing can help reduce the risks of releasing faulty software. For such reason, it is a desired practice for the development of embedded software running in safety-critical cyber-physical systems (CPS). Unfortunately, state-of-the-art test data generation techniques for mutation testing of C and C++ software, two typical languages for CPS software, rely on symbolic execution, whose limitations often prevent its application (e.g., it cannot test black-box components). We propose a mutation testing approach that leverages fuzz testing, which has proved effective with C and C++ software. Fuzz testing automatically generates diverse test inputs that exercise program branches in a varied number of ways and, therefore, exercise statements in different program states, thus maximizing the likelihood of killing mutants, our objective. We performed an empirical assessment of our approach with software components used in satellite systems currently in orbit. Our empirical evaluation shows that mutation testing based on fuzz testing kills a significantly higher proportion of live mutants than symbolic execution (i.e., up to an additional 47 percentage points). Further, when symbolic execution cannot be applied, fuzz testing provides significant benefits (i.e., up to 41% mutants killed). Our study is the first one comparing fuzz testing and symbolic execution for mutation testing; our results provide guidance towards the development of fuzz testing tools dedicated to mutation testing.Comment: This article is the camera-ready version for ASE 202

    KD-ART: Should we intensify or diversify tests to kill mutants?

    Get PDF
    CONTEXT: Adaptive Random Testing (ART) spreads test cases evenly over the input domain. Yet once a fault is found, decisions must be made to diversify or intensify subsequent inputs. Diversification employs a wide range of tests to increase the chances of finding new faults. Intensification selects test inputs similar to those previously shown to be successful. OBJECTIVE: Explore the trade-off between diversification and intensification to kill mutants. METHOD: We augment Adaptive Random Testing (ART) to estimate the Kernel Density (KD–ART) of input values found to kill mutants. KD–ART was first proposed at the 10th International Workshop on Mutation Analysis. We now extend this work to handle real world non numeric applications. Specifically we incorporate a technique to support programs with input parameters that have composite data types (such as arrays and structs). RESULTS: Intensification is the most effective strategy for the numerical programs (it achieves 8.5% higher mutation score than ART). By contrast, diversification seems more effective for programs with composite inputs. KD–ART kills mutants 15.4 times faster than ART. CONCLUSION: Intensify tests for numerical types, but diversify them for composite types

    Técnicas de prueba avanzadas para la generación de casos de prueba

    Get PDF
    Software testing is a crucial phase in software development, particularly in contexts such as critical systems, where even minor errors can have severe consequences. The advent of Industry 4.0 brings new challenges, with software present in almost all industrial systems. Overcoming technical limitations, as well as limited development times and budgets, is a major challenge that software testing faces nowadays. Such limitations can result in insufficient attention being paid to it. The Bay of Cadiz’s industrial sector is known for its world-leading technological projects, with facilities and staff fully committed to innovation. The close relationship between these companies and the University of Cadiz allows for a constant exchange between industry and academia. This PhD thesis aims to identify the most important elements of software testing in Industry 4.0, based on close industrial experience and the latest state-of-the-art work. This allows us to break down the software testing process in a context where large teams work on large-scale, changing projects with numerous dependencies. It also allows us to estimate the percentage benefit that a solution could provide to test engineers throughout the process. Our results indicate a need for non-commercial, flexible, and adaptable solutions for the automation of software testing, capable of meeting the constantly changing needs of industry projects. This work provides a comprehensive study on the industry’s needs and motivates the development of two new solutions using state-of-the-art technologies, which are rarely present in industrial work. These results include a tool, ASkeleTon, which implements a procedure for generating test harnesses based on the Abstract Syntax Tree (AST) and a study examining the ability of the Dynamic Symbolic Execution (DSE) testing technique to generate test data capable of detecting potential faults in software. This study leads to the creation of a novel family of testing techniques, called mutationinspired symbolic execution (MISE), which combines DSE with mutation testing (MT) to produce test data capable of detecting more potential faults than DSE alone. The findings of this work can serve as a reference for future research on software testing in Industry 4.0. The solutions developed in this PhD thesis are able to automate essential tasks in software testing, resulting in significant potential benefits. These benefits are not only for the industry, but the creation of the new family of testing techniques also represents a promising line of research for the scientific community, benefiting all software projects regardless of their field of application.La prueba del software es una de las etapas más importantes durante el desarrollo de software, especialmente en determinados tipos de contextos como el de los sistemas críticos, donde el más mínimo fallo puede conllevar la más grave de las consecuencias. Nuevos paradigmas tecnológicos como la Industria 4.0 conllevan desafíos que nunca antes se habían planteado, donde el software está presente en prácticamente todos los sistemas industriales. Uno de los desafíos más importantes a los que se enfrenta la prueba del software consiste en superar las limitaciones técnicas además de los tiempos de desarrollo y presupuestos limitados, que provocan que en ocasiones no se le preste la atención que merece. El tejido industrial de la Bahía de Cádiz es conocido por sacar adelante proyectos tecnológicos punteros a nivel mundial, con unas instalaciones y un personal totalmente implicado con la innovación. Las buenas relaciones de este conjunto de empresas con la Universidad de Cádiz, sumadas a la cercanía geográfica, permiten que haya una conversación constante entre la industria y la academia. Este trabajo de tesis persigue identificar los elementos más importantes del desarrollo de la prueba del software en la Industria 4.0 en base a una experiencia industrial cercana, además de a los últimos trabajos del estado del arte. Esto permite identificar cada etapa en la que se desglosa la prueba del software en un contexto donde trabajan equipos muy grandes con proyectos de gran envergadura, cambiantes y con multitud de dependencias. Esto permite, además, estimar el porcentaje de beneficio que podría suponer una solución que ayude a los ingenieros de prueba durante todo el proceso. Gracias a los resultados de esta experiencia descubrimos que existe la necesidad de soluciones para la automatización de la prueba del software que sean no comerciales, flexibles y adaptables a las constantes necesidades cambiantes entre los proyectos de la industria. Este trabajo aporta un estudio completo sobre las necesidades de la industria en relación a la prueba del software. Los resultados motivan el desarrollo de dos nuevas soluciones que utilizan tecnologías del estado del arte, ampliamente usadas en trabajos académicos, pero raramente presentes en trabajos industriales. En este sentido, se presentan dos resultados principales que incluyen una herramienta que implementa un procedimiento para la generación de arneses de prueba basada en el Árbol de Sintaxis Abstracta (AST) a la que llamamos ASkeleTon y un estudio donde se comprueba la capacidad de la técnica de pruebas Ejecución Simbólica Dinámica (DSE, por sus siglas en inglés) para generar datos de prueba capaces de detectar fallos potenciales en el software. Este estudio deriva en la creación de una novedosa familia de técnicas de prueba a la que llamamos mutation-inspired symbolic execution (MISE) que combina DSE con la prueba de mutaciones (MT, por sus siglas en inglés) para conseguir un conjunto de datos de prueba capaz de detectar más fallos potenciales que DSE por sí sola. Las soluciones desarrolladas en este trabajo de tesis son capaces de automatizar parte de la prueba del software, resultando en unos beneficios potenciales importantes. No solo se aportan beneficios a la industria, sino que la creación de la nueva familia de técnicas de prueba supone una línea de investigación prometedora para la comunidad científica, siendo beneficiados todos los proyectos software independientemente de su ámbito de aplicación

    Higher Order Mutation Testing

    Get PDF
    Mutation testing is a fault-based software testing technique that has been studied widely for over three decades. To date, work in this field has focused largely on first order mutants because it is believed that higher order mutation testing is too computationally expensive to be practical. This thesis argues that some higher order mutants are potentially better able to simulate real world faults and to reveal insights into programming bugs than the restricted class of first order mutants. This thesis proposes a higher order mutation testing paradigm which combines valuable higher order mutants and non-trivial first order mutants together for mutation testing. To overcome the exponential increase in the number of higher order mutants a search process that seeks fit mutants (both first and higher order) from the space of all possible mutants is proposed. A fault-based higher order mutant classification scheme is introduced. Based on different types of fault interactions, this approach classifies higher order mutants into four categories: expected, worsening, fault masking and fault shifting. A search-based approach is then proposed for locating subsuming and strongly subsuming higher order mutants. These mutants are a subset of fault mask and fault shift classes of higher order mutants that are more difficult to kill than their constituent first order mutants. Finally, a hybrid test data generation approach is introduced, which combines the dynamic symbolic execution and search based software testing approaches to generate strongly adequate test data to kill first and higher order mutants

    Assessment and Improvement of the Practical Use of Mutation for Automated Software Testing

    Get PDF
    Software testing is the main quality assurance technique used in software engineering. In fact, companies that develop software and open-source communities alike actively integrate testing into their software development life cycle. In order to guide and give objectives for the software testing process, researchers have designed test adequacy criteria (TAC) which, define the properties of a software that must be covered in order to constitute a thorough test suite. Many TACs have been designed in the literature, among which, the widely used statement and branch TAC, as well as the fault-based TAC named mutation. It has been shown in the literature that mutation is effective at revealing fault in software, nevertheless, mutation adoption in practice is still lagging due to its cost. Ideally, TACs that are most likely to lead to higher fault revelation are desired for testing and, the fault-revelation of test suites is expected to increase as their coverage of TACs test objectives increase. However, the question of which TAC best guides software testing towards fault revelation remains controversial and open, and, the relationship between TACs test objectives’ coverage and fault-revelation remains unknown. In order to increase knowledge and provide answers about these issues, we conducted, in this dissertation, an empirical study that evaluates the relationship between test objectives’ coverage and fault-revelation for four TACs (statement, branch coverage and, weak and strong mutation). The study showed that fault-revelation increase with coverage only beyond some coverage threshold and, strong mutation TAC has highest fault revelation. Despite the benefit of higher fault-revelation that strong mutation TAC provide for software testing, software practitioners are still reluctant to integrate strong mutation into their software testing activities. This happens mainly because of the high cost of mutation analysis, which is related to the large number of mutants and the limitation in the automation of test generation for strong mutation. Several approaches have been proposed, in the literature, to tackle the analysis’ cost issue of strong mutation. Mutant selection (reduction) approaches aim to reduce the number of mutants used for testing by selecting a small subset of mutation operator to apply during mutants generation, thus, reducing the number of analyzed mutants. Nevertheless, those approaches are not more effective, w.r.t. fault-revelation, than random mutant sampling (which leads to a high loss in fault revelation). Moreover, there is not much work in the literature that regards cost-effective automated test generation for strong mutation. This dissertation proposes two techniques, FaRM and SEMu, to reduce the cost of mutation testing. FaRM statically selects and prioritizes mutants that lead to faults (fault-revealing mutants), in order to reduce the number of mutants (fault-revealing mutants represent a very small proportion of the generated mutants). SEMu automatically generates tests that strongly kill mutants and thus, increase the mutation score and improve the test suites. First, this dissertation makes an empirical study that evaluates the fault-revelation (ability to lead to tests that have high fault-revelation) of four TACs, namely statement, branch, weak mutation and strong mutation. The outcome of the study show evidence that for all four studied TACs, the fault-revelation increases with TAC test objectives’ coverage only beyond a certain threshold of coverage. This suggests the need to attain higher coverage during testing. Moreover, the study shows that strong mutation is the only studied TAC that leads to tests that have, significantly, the highest fault-revelation. Second, in line with mutant reduction, we study the different mutant quality indicators (used to qualify "useful" mutants) proposed in the literature, including fault-revealing mutants. Our study shows that there is a large disagreement between the indicators suggesting that the fault-revealing mutant set is unique and differs from other mutant sets. Thus, given that testing aims to reveal faults, one should directly target fault-revealing mutants for mutant reduction. We also do so in this dissertation. Third, this dissertation proposes FaRM, a mutant reduction technique based on supervised machine learning. In order to automatically discriminate, before test execution, between useful (valuable) and useless mutants, FaRM build a mutants classification machine learning model. The features for the classification model are static program features of mutants categorized as mutant types and mutant context (abstract syntax tree, control flow graph and data/control dependency information). FaRM’s classification model successfully predicted fault-revealing mutants and killable mutants. Then, in order to reduce the number of analyzed mutants, FaRM selects and prioritizes fault-revealing mutants based of the aforementioned mutants classification model. An empirical evaluation shows that FaRM outperforms (w.r.t. the accuracy of fault-revealing mutant selection) random mutants sampling and existing mutation operators-based mutant selection techniques. Fourth, this dissertation proposes SEMu, an automated test input generation technique aiming to increase strong mutation coverage score of test suites. SEMu is based on symbolic execution and leverages multiple cost reduction heuristics for the symbolic execution. An empirical evaluation shows that, for limited time budget, the SEMu generates tests that successfully increase strong mutation coverage score and, kill more mutants than test generated by state-of-the-art techniques. Finally, this dissertation proposes Muteria a framework that enables the integration of FaRM and SEMu into the automated software testing process. Overall, this dissertation provides insights on how to effectively use TACs to test software, shows that strong mutation is the most effective TAC for software testing. It also provides techniques that effectively facilitate the practical use of strong mutation and, an extensive tooling to support the proposed techniques while enabling their extensions for the practical adoption of strong mutation in software testing

    Mutation Testing Advances: An Analysis and Survey

    Get PDF

    Evolutionary Mutation Testing in Object-Oriented Environments

    Get PDF
    La prueba de mutaciones es reconocida como un potente método para evaluar la fortaleza de un conjunto de casos de prueba en la detección de posibles fallos en el código. No obstante, la aplicación de esta técnica es costosa, lo cual ha supuesto normalmente un obstáculo para una mayor acogida de la misma por parte de la industria. Varias técnicas han mostrado ser capaces de reducir ampliamente su coste sin mucha pérdida de efectividad, pero también es cierto que estas técnicas solo han sido evaluadas en determinados contextos, especialmente en el ámbito de los operadores de mutación tradicionales para programas procedurales. Por ejemplo, la Prueba de Mutación Evolutiva ha sido aplicada únicamente a composiciones WS-BPEL, a pesar de que se obtuvo un resultado positivo al seleccionar un subconjunto de mutantes a través de un algoritmo evolutivo a fin de mejorar el conjunto de casos de prueba. Como resultado, se desconoce a día de hoy si los mismos beneficios pueden extrapolarse a otros niveles y dominios. En particular, en esta tesis nos preguntamos hasta qué punto la Prueba de Mutación Evolutiva es también útil para reducir el número de mutantes en sistemas orientados a objetos. Más específicamente, nos enfocamos en el lenguaje de programación C++, ya que la prueba de mutaciones casi no se ha desarrollado respecto a este popular lenguaje a juzgar por la falta de artículos de investigación en este campo que se dirigen este lenguaje. Dado que C++ ha sido apenas abordado en cuanto a investigación y en cuanto a la práctica, en esta tesis nos ocupamos de todas las fases de la prueba de mutaciones: desde la definición e implementación de operadores de mutación en un sistema de mutaciones, hasta la evaluación de esos operadores y la aplicación de la Prueba de Mutación Evolutiva entre otras técnicas de reducción del coste. En esta tesis definimos e implementamos un conjunto de operadores de mutación de clase para C++ en MuCPP, herramienta de mutaciones que nos permite llevar a cabo experimentos con programas reales gracias a las características incorporadas a la misma. Estos operadores de mutación son automatizados siguiendo un conjunto de reglas para que produzcan los mutantes que se esperan de los mismos. En términos generales, los operadores de clase generan bastantes menos mutantes que los operadores tradicionales, un porcentaje mayor de mutantes equivalentes y se aplican con diversa frecuencia dependiendo de las características del programa analizado. El desarrollo de reglas de mejora en la implementación de los operadores permite reducir incluso más el número de mutantes, evitando generar mutantes que no son interesantes para el propósito de la prueba de mutaciones. Otro descubrimiento interesante es que el conjunto de mutantes de clase y el de mutantes tradicionales se complementan, ayudando a diseñar un conjunto de casos de prueba más efectivo. También desarrollamos GiGAn, un nuevo sistema para conectar MuCPP y un algoritmo genético para aplicar la Prueba de Mutación Evolutiva a sistemas orientados a objetos en C++. El algoritmo genético permite reducir el número de mutantes que sería generado por MuCPP ya que guía la búsqueda a la selección de aquellos mutantes que pueden inducir a la generación de nuevos casos de prueba (mutantes fuertes). El rendimiento de esta técnica se muestra mejor que el de un algoritmo aleatorio, tanto cuando se buscan diferentes porcentajes de mutantes fuertes como cuando se simula el refinamiento del conjunto de casos de prueba mediante los mutantes seleccionados por ambas técnicas. La estabilidad de la Prueba de Mutación Evolutiva en los diferentes programas analizados y los buenos resultados en aquellos programas de los que se deriva un mayor número de mutantes son observaciones adicionales. Finalmente, realizamos experimentos para evaluar de forma individual a estos operadores de mutación desde una doble perspectiva: cómo de útiles son para la evaluación (TSE) y para la mejora (TSR) de un conjunto de casos de prueba. Para ello clasificamos a los operadores usando dos métricas distintas: el grado de redundancia (TSE) y la calidad para guiar a la generación de casos de prueba de alta calidad (TSR). Siguiendo estas clasificaciones, ponemos en práctica un estudio selectivo teniendo en cuenta que los operadores menos valiosos están en las últimas posiciones. Este enfoque selectivo revela que los operadores no son necesariamente igual de útiles para TSE y TSR, y que estas clasificaciones son apropiadas para llevar a cabo una estrategia selectiva cuando lo comparamos con la aplicación de otras clasificaciones de operadores o la selección aleatoria de mutantes. Sin embargo, favorecer la generación de mutantes individuales a partir de los operadores mejor valorados es mucha mejor opción que descartar operadores al completo debido a que cada uno de estos operadores se centra en una característica concreta del paradigma de orientación a objetos. En conjunto, todas estas evaluaciones en torno a estos operadores de clase sugieren que la naturaleza de los mismos puede limitar los beneficios de aplicar cualquier técnica de reducción del coste.Mutation testing is acknowledged as a powerful method to evaluate the strength of test suites in detecting possible faults in the code. However, its application is expensive, which has traditionally been an obstacle for a broader use in the industry. While it is true that several techniques have shown to greatly reduce the cost without losing much effectiveness, it is also true that those techniques have been evaluated in limited contexts, especially in the scope of traditional operators for procedural programs. To illustrate this fact, Evolutionary Mutation Testing has only been applied to WS-BPEL compositions, despite the positive outcome when selecting a subset of mutants through an evolutionary algorithm with the aim of improving a test suite. As a result, it is unknown whether the same benefits can be extrapolated to other levels and domains. In particular, we wonder in this thesis to what extent Evolutionary Mutation Testing is also useful to reduce the number of mutants generated by class mutation operators in object-oriented systems. More specifically, we focus on the C++ programming language, since the development of mutation testing with regard to this widely-used language is clearly immature judging from the lack of papers in the literature tackling this language. Given that C++ has been hardly addressed in research and practice, we deal with all the phases of mutation testing: from the definition and implementation of mutation operators in a mutation system to the evaluation of those operators and the application of Evolutionary Mutation Testing among other cost reduction techniques. We define a set of class mutation operators for C++ and implement them in MuCPP, which allows us to perform experiments with real programs thanks to the facilities incorporated into this mutation tool. These mutation operators are automated following a set of guidelines so that they produce the expected mutations. In general, class-level operators generate far fewer mutants than traditional operators, a higher equivalence percentage and they are applied with varying frequency depending on the features of the tested program. Developing improvement rules in the implementation of several mutation operators help further reduce the number of mutants, avoiding the creation of uninteresting mutants. Another interesting finding is that the set of class mutants and the set of traditional mutants complement each other to help the tester design more effective test suites. We also develop GiGAn, a new system to connect the mutation tool MuCPP and a genetic algorithm to apply Evolutionary Mutation Testing to C++ object-oriented systems. The genetic algorithm allows reducing the number of mutants that would be generated by MuCPP as it guides to the selection of those mutants that can induce the generation of new test cases (strong mutants). The performance of this technique shows to be better than the application of a random algorithm, both when trying to find different percentages of strong mutants and also when simulating the refinement of the test suite through the mutants selected by each of these techniques. The stability of EMT among different case studies and the good results of the simulation in the programs that lead to the largest set of mutants are additional observations. Finally, we conduct an experiment to assess individually these mutation operators from a double perspective: how useful they are for the evaluation of the test suite (TSE) and its refinement (TSR). To that end, we rank the operators using two different metrics: degree of redundancy (TSE) and quality to guide on the generation of high-quality test cases (TSR). Based on these rankings, we perform a selective study taking into account that the less valuable operators are at the bottom of the classification. This selective approach reveals that an operator is not necessarily as useful for TSE as for TSR, and that these rankings are appropriate for a selective strategy when compared to other rankings or the selection of mutants randomly. However, favouring the generation of individual mutants from the best-valued operators is much better than discarding operators completely because each of the operators targets a particular object-oriented feature. Altogether, these evaluations about class operators suggest that their nature can limit the benefits of any cost reduction technique.Este trabajo fue financiado por la beca de investigación PU-EPIF-FPI-PPI-BC 2012- 037 de la Universidad de Cádiz, por el proyecto DArDOS (TIN2015-65845-C3-3-R) del Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad del Ministerio de Economía y Competitividad, y por la Red de Excelencia SEBASENET (TIN2015-71841-REDT) del Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia del Ministerio de Economía y Competitividad.Número de páginas: 23

    Guiding Quality Assurance Through Context Aware Learning

    Get PDF
    Software Testing is a quality control activity that, in addition to finding flaws or bugs, provides confidence in the software’s correctness. The quality of the developed software depends on the strength of its test suite. Mutation Testing has shown that it effectively guides in improving the test suite’s strength. Mutation is a test adequacy criterion in which test requirements are represented by mutants. Mutants are slight syntactic modifications of the original program that aim to introduce semantic deviations (from the original program) necessitating the testers to design tests to kill these mutants, i.e., to distinguish the observable behavior between a mutant and the original program. This process of designing tests to kill a mutant is iteratively performed for the entire mutant set, which results in augmenting the test suite, hence improving its strength. Although mutation testing is empirically validated, a key issue is that its application is expensive due to the large number of low-utility mutants that it introduces. Some mutants cannot be even killed as they are functionally equivalent to the original program. To reduce the application cost, it is imperative to limit the number of mutants to those that are actually useful. Since it requires manual analysis and test executions to identify such mutants, there is a lack of an effective solution to the problem. Hence, it remains unclear how to mutate and test a code efficiently. On the other hand, with the advancement in deep learning, several works in the literature recently focused on using it on source code to automate many nontrivial tasks including bug fixing, producing code comments, code completion, and program repair. The increasing utilization of deep learning is due to a combination of factors. The first is the vast availability of data to learn from, specifically source code in open-source repositories. The second is the availability of inexpensive hardware able to efficiently run deep learning infrastructures. The third and the most compelling is its ability to automatically learn the categorization of data by learning the code context through its hidden layer architecture, making it especially proficient in identifying features. Thus, we explore the possibility of employing deep learning to identify only useful mutants, in order to achieve a good trade-off between the invested effort and test effectiveness. Hence, as our first contribution, this dissertation proposes Cerebro, a deep learning approach to statically select subsuming mutants based on the mutants’ surrounding code context. As subsuming mutants reside at the top of the subsumption hierarchy, test cases designed to only kill this minimal subset of mutants kill all the remaining mutants. Our evaluation of Cerebro demonstrates that it preserves the mutation testing benefits while limiting the application cost, i.e., reducing all cost factors such as equivalent mutants, mutant executions, and the mutants requiring analysis. Apart from improving test suite strength, mutation testing has been proven useful in inferring software specifications. Software specifications aim at describing the software’s intended behavior and can be used to distinguish correct from incorrect software behaviors. Specification inference techniques aim at inferring assertions by generating and filtering candidate assertions through dynamic test executions and mutation testing. Due to the introduction of a large number of mutants during mutation testing such techniques are also computationally expensive, hence establishing a need for the selection of mutants that fit best for assertion inference. We refer to such mutants as Assertion Inferring Mutants. In our analysis, we find that the assertion inferring mutants are significantly different from the subsuming mutants. Thus, we explored the employability of deep learning to identify Assertion Inferring Mutants. Hence, as our second contribution, this dissertation proposes Seeker, a deep learning approach to statically select Assertion Inferring Mutants. Our evaluation demonstrates that Seeker enables an assertion inference capability comparable to the full mutation analysis while significantly limiting the execution cost. In addition to testing software in general, a few works in the literature attempt to employ mutation testing to tackle security-related issues, due to the fault-based nature of the technique. These works propose mutation operators to convert non-vulnerable code to vulnerable by mimicking common security bugs. However, these pattern-based approaches have two major limitations. Firstly, the design of security-specific mutation operators is not trivial. It requires manual analysis and comprehension of the vulnerability classes. Secondly, these mutation operators can alter the program semantics in a manner that is not convincing for developers and is perceived as unrealistic, thereby hindering the usability of the method. On the other hand, with the release of powerful language models trained on large code corpus, e.g. CodeBERT, a new family of mutation testing tools has arisen with the promise to generate natural mutants. We study the extent to which the mutants produced by language models can semantically mimic the behavior of vulnerabilities aka Vulnerability-mimicking Mutants. Designed test cases failed by these mutants will also tackle mimicked vulnerabilities. In our analysis, we found that a very small subset of mutants is vulnerability-mimicking. Though, this set mimics more than half of the vulnerabilities in our dataset. Due to the absence of any defined features to identify vulnerability-mimicking mutants, as our third contribution, this dissertation introduces Mystique, a deep learning approach that automatically extracts features to identify vulnerability-mimicking mutants. Despite the scarcity, Mystique predicts vulnerability-mimicking mutants with a high prediction performance, demonstrating that their features can be automatically learned by deep learning models to statically predict these without the need of investing any effort in defining features. Since our vulnerability-mimicking mutants cannot mimic all the vulnerabilities, we perceive that these mutants are not a complete representation of all the vulnerabilities and there exists a need for actual vulnerability prediction approaches. Although there exist many such approaches in the literature, their performance is limited due to a few factors. Firstly, vulnerabilities are fewer in comparison to software bugs, limiting the information one can learn from, which affects the prediction performance. Secondly, the existing approaches learn on both, vulnerable, and supposedly non-vulnerable components. This introduces an unavoidable noise in training data, i.e., components with no reported vulnerability are considered non-vulnerable during training, and hence, results in existing approaches performing poorly. We employed deep learning to automatically capture features related to vulnerabilities and explored if we can avoid learning on supposedly non-vulnerable components. Hence, as our final contribution, this dissertation proposes TROVON, a deep learning approach that learns only on components known to be vulnerable, thereby making no assumptions and bypassing the key problem faced by previous techniques. Our comparison of TROVON with existing techniques on security-critical open-source systems with historical vulnerabilities reported in the National Vulnerability Database (NVD) demonstrates that its prediction capability significantly outperforms the existing techniques

    Automated testing for GPU kernels

    Get PDF
    Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them widely used in various general-purpose computing domains. However, this has also made testing GPU kernels critical to ensure that their behaviour meets the requirements of the design and specification. Despite the advances in programmability, GPU kernels are hard to code and analyse due to the high complexity of memory sharing patterns, striding patterns for memory accesses, implicit synchronisation, and combinatorial explosion of thread interleavings. Existing few techniques for testing GPU kernels use symbolic execution for test generation that incur a high overhead, have limited scalability and do not handle all data types. In this thesis, we present novel approaches to measure test effectiveness and generate tests automatically for GPU kernels. To achieve this, we address significant challenges related to the GPU execution and memory model, and the lack of customised thread scheduling and global synchronisation. We make the following contributions: First, we present a framework, CLTestCheck, for assessing the quality of test suites developed for GPU kernels. The framework can measure code coverage using three different coverage metrics that are inspired by faults found in real kernel code. Fault finding capability of the test suite is also measured by the framework to seed different types of faults in the kernel and reported in the form of mutation score, which is the ratio of the number of uncovered faults to the total number of seeded faults. Second, with the goal of being fast, effective and scalable, we propose a test generation technique, CLFuzz, for GPU kernels that combines mutation-based fuzzing for fast test generation and selective SMT solving to help cover unreachable branches by fuzzing. Fuzz testing for GPU kernels has not been explored previously. Our approach for fuzz testing randomly mutates input kernel argument values with the goal of increasing branch coverage and supports GPU-specific data types such as images. When fuzz testing is unable to increase branch coverage with random mutations, we gather path constraints for uncovered branch conditions, build additional constraints to represent the context of GPU execution such as number of threads and work-group size, and invoke the Z3 constraint solver to generate tests for them. Finally, to help uncover inter work-group data races and replay these bugs with fixed work-group schedules, we present a schedule amplifier, CLSchedule, that simulates multiple work-group schedules, with which to execute each of the generated tests. By reimplementing the OpenCL API, CLSchedule executes the kernel with a fixed work-group schedule rather than the default arbitrary schedule. It also executes the kernel directly, without requiring the developer to manually provide boilerplate host code. The outcome of our research can be summarised as follows: 1. CLTestCheck is applied to 82 publicly available GPU kernels from industry-standard benchmark suites along with their test suites. The experiment reveals that CLTestCheck is capable of automatically measuring the effectiveness of test suites, in terms of code coverage, faulting finding capability and revealing data races in real OpenCL kernels. 2. CLFuzz can automatically generate tests and achieve close to 100% coverage and mutation score for the majority of the data set of 217 GPU kernels collected from open-source projects and industry-standard benchmarks. 3. CLSchedule is capable of exploring the effect of work-group schedules on the 217 GPU kernels and uncovers data races in 21 of them. The techniques developed in this thesis demonstrate that we can measure the effectiveness of tests developed for GPU kernels with our coverage criteria and fault seeding methods. The result is useful in highlighting code portions that may need developers' further attention. Our automated test generation and work-group scheduling approaches are also fast, effective and scalable, with small overhead incurred (average of 0.8 seconds) and scalability to large kernels with complex data structures

    Model Checking and Model-Based Testing : Improving Their Feasibility by Lazy Techniques, Parallelization, and Other Optimizations

    Get PDF
    This thesis focuses on the lightweight formal method of model-based testing for checking safety properties, and derives a new and more feasible approach. For liveness properties, dynamic testing is impossible, so feasibility is increased by specializing on an important class of properties, livelock freedom, and deriving a more feasible model checking algorithm for it. All mentioned improvements are substantiated by experiments
    corecore