5,567 research outputs found

    Empirical Evaluation of Mutation-based Test Prioritization Techniques

    Full text link
    We propose a new test case prioritization technique that combines both mutation-based and diversity-based approaches. Our diversity-aware mutation-based technique relies on the notion of mutant distinguishment, which aims to distinguish one mutant's behavior from another, rather than from the original program. We empirically investigate the relative cost and effectiveness of the mutation-based prioritization techniques (i.e., using both the traditional mutant kill and the proposed mutant distinguishment) with 352 real faults and 553,477 developer-written test cases. The empirical evaluation considers both the traditional and the diversity-aware mutation criteria in various settings: single-objective greedy, hybrid, and multi-objective optimization. The results show that there is no single dominant technique across all the studied faults. To this end, \rev{we we show when and the reason why each one of the mutation-based prioritization criteria performs poorly, using a graphical model called Mutant Distinguishment Graph (MDG) that demonstrates the distribution of the fault detecting test cases with respect to mutant kills and distinguishment

    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    Get PDF
    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations

    Time-Space Efficient Regression Testing for Configurable Systems

    Full text link
    Configurable systems are those that can be adapted from a set of options. They are prevalent and testing them is important and challenging. Existing approaches for testing configurable systems are either unsound (i.e., they can miss fault-revealing configurations) or do not scale. This paper proposes EvoSPLat, a regression testing technique for configurable systems. EvoSPLat builds on our previously-developed technique, SPLat, which explores all dynamically reachable configurations from a test. EvoSPLat is tuned for two scenarios of use in regression testing: Regression Configuration Selection (RCS) and Regression Test Selection (RTS). EvoSPLat for RCS prunes configurations (not tests) that are not impacted by changes whereas EvoSPLat for RTS prunes tests (not configurations) which are not impacted by changes. Handling both scenarios in the context of evolution is important. Experimental results show that EvoSPLat is promising. We observed a substantial reduction in time (22%) and in the number of configurations (45%) for configurable Java programs. In a case study on a large real-world configurable system (GCC), EvoSPLat reduced 35% of the running time. Comparing EvoSPLat with sampling techniques, 2-wise was the most efficient technique, but it missed two bugs whereas EvoSPLat detected all bugs four times faster than 6-wise, on average.Comment: 14 page

    Feedback driven adaptive combinatorial testing

    Get PDF
    The configuration spaces of modern software systems are too large to test exhaustively. Combinatorial interaction testing (CIT) approaches, such as covering arrays, systematically sample the configuration space and test only the selected configurations. The basic justification for CIT approaches is that they can cost-effectively exercise all system behaviors caused by the settings of t or fewer options. We conjecture, however, that in practice many such behaviors are not actually tested because of masking effects – failures that perturb execution so as to prevent some behaviors from being exercised. In this work we present a feedback-driven, adaptive, combinatorial testing approach aimed at detecting and working around masking effects. At each iteration we detect potential masking effects, heuristically isolate their likely causes, and then generate new covering arrays that allow previously masked combinations to be tested in the subsequent iteration. We empirically assess the effectiveness of the proposed approach on two large widely used open source software systems. Our results suggest that masking effects do exist and that our approach provides a promising and efficient way to work around them

    Visualizing test diversity to support test optimisation

    Full text link
    Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities

    Multi-Objective Search-Based Software Microbenchmark Prioritization

    Full text link
    Ensuring that software performance does not degrade after a code change is paramount. A potential solution, particularly for libraries and frameworks, is regularly executing software microbenchmarks, a performance testing technique similar to (functional) unit tests. This often becomes infeasible due to the extensive runtimes of microbenchmark suites, however. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, reducing their potential application drastically. In this paper, we propose a search-based technique based on multi-objective evolutionary algorithms (MOEAs) to improve the current state of microbenchmark prioritization. The technique utilizes three objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that our technique improves over the best coverage-based, greedy baselines in terms of average percentage of fault-detection on performance (APFD-P) and Top-3 effectiveness by 26 percentage points (pp) and 43 pp (for Additional) and 17 pp and 32 pp (for Total) to 0.77 and 0.24, respectively. Employing the Indicator-Based Evolutionary Algorithm (IBEA) as MOEA leads to the best effectiveness among six MOEAs. Finally, the technique's runtime overhead is acceptable at 19% of the overall benchmark suite runtime, if we consider the enormous runtimes often spanning multiple hours. The added overhead compared to the greedy baselines is miniscule at 1%.These results mark a step forward for universally applicable performance regression testing techniques.Comment: 17 pages, 5 figure

    Preemptive regression test scheduling strategies: a new testing approach to thriving on the volatile service environments

    Get PDF
    A workflow-based web service may use ultra-late binding to invoke external web services to concretize its implementation at run time. Nonetheless, such external services or the availability of recently used external services may evolve without prior notification, dynamically triggering the workflow-based service to bind to new replacement external services to continue the current execution. Any integration mismatch may cause a failure. In this paper, we propose Preemptive Regression Testing (PRT), a novel testing approach that addresses this adaptive issue. Whenever such a late-change on the service under regression test is detected, PRT preempts the currently executed regression test suite, searches for additional test cases as fixes, runs these fixes, and then resumes the execution of the regression test suite from the preemption point. © 2012 IEEE |postprin

    Applying test case prioritization to software microbenchmarks

    Full text link
    Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative

    Applying test case prioritization to software microbenchmarks

    Get PDF
    Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative
    corecore