45 research outputs found

    Generating Diverse Test Suites for Gson Through Adaptive Fitness Function Selection

    Get PDF
    Many fitness functions - such as those targeting test suite diversity—do not yield sufficient feedback to drive test generation. We propose that diversity can instead be improved through\ua0adaptive fitness function selection\ua0(AFFS), an approach that varies the fitness functions used throughout the generation process in order to strategically increase diversity. We have evaluated our AFFS framework, EvoSuiteFIT, on a set of 18 real faults from Gson, a JSON (de)serialization library. Ultimately, we find that AFFS creates test suites that are more diverse than those created using static fitness functions. We also observe that increased diversity may lead to small improvements in the likelihood of fault detection

    DSpot: Test Amplification for Automatic Assessment of Computational Diversity

    Full text link
    Context: Computational diversity, i.e., the presence of a set of programs that all perform compatible services but that exhibit behavioral differences under certain conditions, is essential for fault tolerance and security. Objective: We aim at proposing an approach for automatically assessing the presence of computational diversity. In this work, computationally diverse variants are defined as (i) sharing the same API, (ii) behaving the same according to an input-output based specification (a test-suite) and (iii) exhibiting observable differences when they run outside the specified input space. Method: Our technique relies on test amplification. We propose source code transformations on test cases to explore the input domain and systematically sense the observation domain. We quantify computational diversity as the dissimilarity between observations on inputs that are outside the specified domain. Results: We run our experiments on 472 variants of 7 classes from open-source, large and thoroughly tested Java classes. Our test amplification multiplies by ten the number of input points in the test suite and is effective at detecting software diversity. Conclusion: The key insights of this study are: the systematic exploration of the observable output space of a class provides new insights about its degree of encapsulation; the behavioral diversity that we observe originates from areas of the code that are characterized by their flexibility (caching, checking, formatting, etc.).Comment: 12 page

    Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?

    Full text link
    Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that.Comment: EASE 201

    Assessing the Effectiveness of Defect Prediction-based Test Suites at Localizing Faults

    Get PDF
    Debugging a software program constitutes a significant and laborious task for programmers, often consuming a substantial amount of time. The need to identify faulty lines of code further compounds this challenge, leading to decreased overall productivity. Consequently, the development of automated tools for fault detection becomes imperative to streamline the debugging process and enhance programmer productivity. In recent years, the field of automatic test generation has witnessed remarkable advancements, significantly improving the efficacy of automatic tests in detecting faults. The localization of faults can be further optimized through the utilization of such sophisticated tools. This dissertation aims to conduct an experimental study that assembles specialized automatic test generation tools designed to detect faults by estimating the likelihood of code being faulty. These tools will be compared against each other to discern their relative performance and effectiveness. Additionally, the study will comprehensively compare developer-generated tests with automatically generated tests to evaluate their respective aptitude for fault detection. Through this investigation, we seek to identify the most effective automated test generation tool while providing valuable insights into the relative merits of developer-generated and automatically generated tests for fault detection

    The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study

    Get PDF
    Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.Comment: Under submission to Software Testing, Verification, and Reliability journal. (arXiv admin note: text overlap with arXiv:2107.00906 - This is an earlier study that this study extends

    Causal Consistency Verification in Restful Systems

    Get PDF
    Replicated systems cannot maintain both availability and (strong) consistency when exposed to network partitions. Strong consistency requires every read to return the last written value, which can lead clients to experience high latency or even timeout errors. Replicated applications usually rely on weak consistency, since clients can perform operations contacting a single replica, leading to decreased latency and increased availability. Causal consistency is a weak consistency model, however, it is the strongest one for highly available systems. Many applications are switching to this particular consistency model, since it ensures users never observe data items before they observe the ones that influenced their creation. Verifying if applications satisfy the consistency they claim to provide is no easy task. In this dissertation, we propose an algorithm to verify causal consistency in RESTful applications. Our approach adopts a black box testing, where multiple concurrent clients execute operations in a service and records the log of interactions. This log of interactions is then processed to verify if the results respect causal consistency. The key challenge is to infer causal dependencies among operations executed in different clients without adding any additional metadata to the data maintained by the service. When considering a particular operation, the algorithm builds a new dependency graph that considers one of the possible justifications the operation might have, but if this justification results in failure further ahead in the processing, it is necessary to build another graph considering another justification of that same operation. The algorithm relies on recursion in order to achieve this backtracking behaviour. If the algorithm is able to build a graph containing every operation present in the log, where the chosen justifications remain valid until the end of the processing, it outputs that the execution corresponding to that log satisfies causal consistency. The evaluation confirms that the algorithm is able to detect violations when feeding either small or large logs representing executions of RESTful applications that do not satisfy causal consistency.Os sistemas replicados não podem manter a disponibilidade e a consistência (forte) quando expostos a partições de rede. A consistência forte exige que cada leitura retorne o último valor escrito, o que pode levar os clientes a experienciar alta latência ou até mesmo erros de tempo limite. As aplicações replicados geralmente usam consistência fraca, pois os clientes podem realizar operações contactando uma única réplica, levando a latências baixas e maior disponibilidade. A consistência causal é um modelo de consistência fraco, mas é o mais forte para sistemas altamente disponíveis. Muitas aplicações usam este modelo, pois garante que os clientes nunca observem dados antes de observar os que influenciaram a sua criação. Verificar se as aplicações satisfazem a consistência que alegam fornecer não é fácil. Nesta dissertação, propomos um algoritmo para verificar a consistência causal em aplicações RESTful. A nossa abordagem adota um teste de caixa negra, onde vários clientes concurrentes executam operações num serviço, onde as interações são documentadas num ficheiro. Este ficheiro é processado para verificar se os resultados respeitam a consistência causal. O principal desafio é inferir as dependências causais entre as operações executadas em diferentes clientes sem adicionar metadados adicionais aos dados mantidos pelo serviço. Ao considerar uma determinada operação, o algoritmo constrói um novo grafo de dependências que considera uma das possíveis justificações que a operação possa ter, mas se esta justificação resultar em erro mais tarde no processamento, é necessário construir outro grafo considerando outra justificação dessa mesma operação. O algoritmo é recursivo de modo a alcançar esse comportamento de retrocesso. Se o algoritmo conseguir construir um grafo que contém todas as operações presentes no ficheiro, onde as justificações escolhidas permanecem válidas até o final do processamento, indica que a execução correspondente a este ficheiro satisfaz a consistência causal. Aavaliação confirma que o algoritmo é capaz de detectar violações ao fornecer ficheiros pequenos ou grandes representando execuções de aplicações RESTful que não satisfazem a consistência causal

    WPI Suite Exemplar Module

    Get PDF
    This Major Qualifying Project involved designing and writing a client and exemplar module for the WPI Suite TNG core server. Our client, named Janeway, was written in Java and provides module developers with a standard method for interacting with the server and the user. The goal for Janeway was to enable software engineering students to accomplish these tasks without needing much knowledge of network protocols or languages other than Java. Our goal for the exemplar module was to provide a useful example for students to reference when building a module for WPI Suite TNG. We also wrote a significant amount of developer documentation to assist students who needed to set up their development environment or use the various software APIs that the exemplar and core teams provided

    Automated recommendation, reuse, and generation of unit tests for software systems

    Get PDF
    This thesis presents a body of work relating to the automated discovery, reuse, and generation of unit tests for software systems with the goal of improving the efficiency of the software engineering process and the quality of the produced software. We start with a novel approach to test-to-code traceability link establishment, called TCTracer, which utilises multilevel information and an ensemble of static and dynamic techniques to achieve state-of-the-art accuracy when establishing links between tests and tested functions and test classes and tested classes. This approach is utilised to provide test-to-code traceability links which facilitate multiple other parts of the work. We then move on to test reuse where we first define an abstract framework, called Rashid, for using connections between artefacts to identify new artefacts for reuse and utilise this framework in Relatest, an approach for producing test recommendations for new functions. Relatest instantiates Rashid by using TCTracer to establish connections between tests and functions and code similarity measures to establish connections between similar functions. This information is used to create lists of recommendations for new functions. We then present an investigation into the automated transplantation of tests which attempts to remove the manual effort required to transform Relatest recommendations and insert them into another project. Finally, we move on to test generation where we utilise neural networks to generate unit test code by learning from existing function-to-test pairs. The first approach, TestNMT, investigates using recurrent neural networks to generate whole JUnit tests and the second approach, ReAssert, utilises a transformer-based architecture to generate JUnit asserts. In total, this thesis addresses the problem by developing approaches for the discovery, reuse, and utilisation of existing functions and tests, including the establishment of relationships between these artefacts, developing mechanisms to aid automated test reuse and learning from existing tests to generate new tests
    corecore