61 research outputs found

    Test Flakiness Prediction Techniques for Evolving Software Systems

    Get PDF

    Test Quality Assurance for E2E Web Test Suites: Parallelization of Dependent Test Suites and Test Flakiness Prevention

    Get PDF
    Web applications support a wide range of activities today, from e-commerce to health management, and ensuring their quality is a fundamental task. Nevertheless, testing these systems is hard because of their dynamic and asynchronous nature and their heterogeneity. Quality assurance of Web applications is usually performed through testing, performed at different levels of abstraction. At the End-to-end (E2E) level, test scripts interact with the application through the web browser, as a human user would do. This kind of testing is usually time consuming, and its execution time can be reduced by running the test suite in parallel. However, the presence of dependen- cies in the test suite can make test parallelization difficult. Best practices prescribe that test scripts in a test suite should be independent (i.e. they should not assume that the system under test is already in an expected state), but this is not always done in practice: dependent tests are a serious problem that affects end-to-end web test suites. Moreover, test dependencies are a problem because they enforce an execution order for the test suite, preventing the use of techniques like test selection, test prioritization, and test parallelization. Another issue that affects E2E Web test suites is test flakiness: a test script is called flaky when it may non-deterministically pass or fail on the same version of the Ap- plication Under Test. Test flakiness is usually caused by multiple factors, that can be very hard to determine: most common causes of flakiness are improper waiting for async operations, not respected test order dependencies and concurrency problems (e.g. race conditions, deadlocks, atomicity violations). Test flakiness is a problem that affects E2E test execution in general, but it can have a greater impact in presence of dependencies, since 1) if a test script fails due to flakiness, other test scripts that depend on it will probably fail as well, 2) most dependency-detection approaches and tools rely on multiple executions of test schedules in different orders to detect dependencies. In order to do that, execution results must be deterministic: if test scripts can pass or fail non-deterministically, those dependency detection tools can not work. This thesis proposes to improve the quality assurance for E2E Web test suites in two different directions: 1. enabling the parallel execution of dependent E2E Web test suites in a opti- mized, efficient way 2. preventing flakiness by automated refactoring of E2E Web test suites, in order to adopt the proper waiting strategies for page elements For the first research direction we propose STILE (teST suIte paralLElizer), a tool- based approach that allows parallel execution of E2E Web test suites. Our approach generates a set of test schedules that respect two important constraints: 1) every schedule respects existing test dependencies, 2) all test scripts in the test suite are executed at least once, considering all the generated schedules. For the second research direction we propose SleepReplacer, a tool-based approach to automatically refactor E2E Web test suites in order to prevent flakiness. Both of the tool-based approaches has been fully implemented in two functioning and publicly available tools, and empirically validated on different test suites

    Understanding and Mitigating Flaky Software Test Cases

    Get PDF
    A flaky test is a test case that can pass or fail without changes to the test case code or the code under test. They are a wide-spread problem with serious consequences for developers and researchers alike. For developers, flaky tests lead to time wasted debugging spurious failures, tempting them to ignore future failures. While unreliable, flaky tests can still indicate genuine issues in the code under test, so ignoring them can lead to bugs being missed. The non-deterministic behaviour of flaky tests is also a major snag to continuous integration, where a single flaky test can fail an entire build. For researchers, flaky tests challenge the assumption that a test failure implies a bug, an assumption that many fundamental techniques in software engineering research rely upon, including test acceleration, mutation testing, and fault localisation. Despite increasing research interest in the topic, open problems remain. In particular, there has been relatively little attention paid to the views and experiences of developers, despite a considerable body of empirical work. This is essential to guide the focus of research into areas that are most likely to be beneficial to the software engineering industry. Furthermore, previous automated techniques for detecting flaky tests are typically either based on exhaustively rerunning test cases or machine learning classifiers. The prohibitive runtime of the rerunning approach and the demonstrably poor inter-project generalisability of classifiers leaves practitioners with a stark choice when it comes to automatically detecting flaky tests. In response to these challenges, I set two high-level goals for this thesis: (1) to enhance the understanding of the manifestation, causes, and impacts of flaky tests; and (2) to develop and empirically evaluate efficient automated techniques for mitigating flaky tests. In pursuit of these goals, this thesis makes five contributions: (1) a comprehensive systematic literature review of 76 published papers; (2) a literature-guided survey of 170 professional software developers; (3) a new feature set for encoding test cases in machine learning-based flaky test detection; (4) a novel approach for reducing the time cost of rerunning-based techniques for detecting flaky tests by combining them with machine learning classifiers; and (5) an automated technique that detects and classifies existing flaky tests in a project and produces reusable project-specific machine learning classifiers able to provide fast and accurate predictions for future test cases in that project

    Training development for pavement preservation: chip sealing and fog sealing

    Get PDF
    The benefits of pavement preservation are only achieved if properly selected, designed, and applied. In some cases there is a lack of training when conducting one of these steps and the objective of applying pavement preservation techniques gets hampered. Literature on pavement preservation is extensive, but from a training point of view there is no structured approach on how to train people for selecting, designing, and applying pavement preservation techniques. The objective of the research is to develop a training learning management system that addresses pavement preservation treatments (chip seals, fog seals, slurry systems, and crack seals and fills) as they are dealt with during the phases of selection, design, and construction. This thesis will focus on training for chip seals and fog seals. Although Iowa was used as a case study, the findings can be applied in other locations. To begin the study, it was critical to identify the staff divisions to be trained and the treatments to be included. Through several meetings with the agency three staff divisions were identified. The staff divisions are: the maintenance staff (in charge of selection), design staff, and construction staff. Also, the treatments mentioned before were identified as the focus of the study due to their common use. Through means of needs analysis questionnaires and meetings the knowledge gap and needs of the agency were identified. The training presentations developed targets such gap and needs as the primary focus. The concepting (selection) training focuses on providing the tools necessary, such as life cycle cost analysis and asset management, to help make proper selections. The design trainings focus on providing the information necessary on the properties of the materials (mostly binders and aggregates) and how to make proper material selection. Finally, the construction trainings focus on providing equipment calibration procedure, inspection responsibilities, and visual images of poor and best practices. The research showed that it is important to train each division staff (maintenance, design, and construction) separately as each staff division has its own needs and interests. Also, it is preferred if each pavement preservation treatment is approached on its own. The research also determined that it is critical to create a structured plan aimed to develop a structured learning management system. It was also found that the results of this research are applicable in many locations across the nation, not just Iowa. Finally, it is recommended to research for the performance of pavement preservation treatments pre and post-training to compare their results and verify the effectiveness of the learning management system

    Logs and Models in Engineering Complex Embedded Production Software Systems

    Get PDF

    Logs and Models in Engineering Complex Embedded Production Software Systems

    Get PDF

    Trade-Off Exploration for Acceleration of Continuous Integration

    Get PDF
    Continuous Integration (CI) is a popular software development practice that allows developers to quickly verify modifications to their projects. To cope with the ever-increasing demand for faster software releases, CI acceleration approaches have been proposed to expedite the feedback that CI provides. However, adoption of CI acceleration is not without cost. The trade-off in duration and trustworthiness of a CI acceleration approach determines the practicality of the CI acceleration process. Indeed, if a CI acceleration approach takes longer to prime than to run the accelerated build, the benefits of acceleration are unlikely to outweigh the costs. Moreover, CI acceleration techniques may mislabel change sets (e.g., a build labelled as failing that passes in an unaccelerated setting or vice versa) or produce results that are inconsistent with an unaccelerated build (e.g., the underlying reason for failure does not match with the unaccelerated build). These inconsistencies call into question the trustworthiness of CI acceleration products. We first evaluate the time trade-off of two CI acceleration products — one based on program analysis (PA) and the other on machine learning (ML). After replaying the CI process of 100,000 builds spanning ten open-source projects, we find that the priming costs (i.e., the extra time spent preparing for acceleration) of the program analysis product are substantially less than that of the machine learning product (e.g., average project-wise median cost difference of 148.25 percentage points). Furthermore, the program analysis product generally provides more time savings than the machine learning product (e.g., average project-wise median savings improvement of 5.03 percentage points). Given their deterministic nature, and our observations about priming costs and benefits, we recommend that organizations consider the adoption of program analysis based acceleration. Next, we study the trustworthiness of the same PA and ML CI acceleration products. We re-execute 50 failing builds from ten open-source projects in non-accelerated (baseline), program analysis accelerated, and machine learning accelerated settings. We find that when applied to known failing builds, program analysis accelerated builds more often (43.83 percentage point difference across ten projects) align with the non-accelerated build results. Accordingly, we conclude that while there is still room for improvement for both CI acceleration products, the selected program analysis product currently provides a more trustworthy signal of build outcomes than the machine learning product. Finally, we propose a mutation testing approach to systematically evaluate the trustworthiness of CI acceleration. We apply our approach to the deterministic PA-based CI acceleration product and uncover issues that hinder its trustworthiness. Our analysis consists of three parts: we first study how often the same build in accelerated and unaccelerated CI settings produce different mutation testing outcomes. We call mutants with different outcomes in the two settings “gap mutants”. Next, we study the code locations where gap mutants appear. Finally, we inspect gap mutants to understand why acceleration causes them to survive. Our analysis of ten thriving open-source projects uncovers 2,237 gap mutants. We find that: (1) the gap in mutation outcomes between accelerated and unaccelerated settings varies from 0.11%–23.50%; (2) 88.95% of gap mutants can be mapped to specific source code functions and classes using the dependency representation of the studied CI acceleration product; (3) 69% of gap mutants survive CI acceleration due to deterministic reasons that can be classified into six fault patterns. Our results show that deterministic CI acceleration suffers from trustworthiness limitations, and highlights the ways in which trustworthiness could be improved in a pragmatic manner. This thesis demonstrates that CI acceleration techniques, whether PA or ML-based, present time trade-offs and can reduce software build trustworthiness. Our findings lead us to encourage users of CI acceleration to carefully weigh both the time costs and trustworthiness of their chosen acceleration technique. This study also demonstrates that the following improvements for PA-based CI acceleration approaches would improve their trustworthiness: (1) depending on the size and complexity of the codebase, it may be necessary to manually refine the dependency graph, especially by concentrating on class properties, global variables, and constructor components; and (2) solutions should be added to detect and bypass flaky test during CI acceleration to minimize the impact of flakiness

    An interview study about the use of logs in embedded software engineering

    Get PDF
    Context: Execution logs capture the run-time behavior of software systems. To assist developers in their maintenance tasks, many studies have proposed tools to analyze execution information from logs. However, it is as yet unknown how industry developers use logs in embedded software engineering. Objective: In this study, we aim to understand how developers use logs in an embedded software engineering context. Specifically, we would like to gain insights into the type of logs developers analyze, the purposes for which developers analyze logs, the information developers need from logs and their expectation on tool support. Method: In order to achieve the aim, we conducted these interview studies. First, we interviewed 25 software developers from ASML, which is a leading company in developing lithography machines. This exploratory case study provides the preliminary findings. Next, we validated and refined our findings by conducting a replication study. We involved 14 interviewees from four companies who have different software engineering roles in their daily work. Results: As the result of our first study, we compile a preliminary taxonomy which consists of four types of logs used by developers in practice, 18 purposes of using logs, 13 types of information developers search in logs, 13 challenges faced by developers in log analysis and three suggestions for tool support provided by developers. This taxonomy is refined in the replication study with three additional purposes, one additional information need, four additional challenges and three additional suggestions of tool support. In addition, with these two studies, we observed that text-based editors and self-made scripts are commonly used when it comes to tooling in log analysis practice. As indicated by the interviewees, the development of automatic analysis tools is hindered by the quality of the logs, which further suggests several challenges in log instrumentation and management. Conclusions: Based on our study, we provide suggestions for practitioners on logging practices. We provide implications for tool builders on how to further improve tools based on existing techniques. Finally, we suggest some research directions and studies for researchers to further study software logging.</p
    • …
    corecore