17,164 research outputs found
An Empirical Comparison of Combinatorial Testing, Random Testing and Adaptive Random Testing
We present an empirical comparison of three test generation techniques, namely, Combinatorial Testing (CT), Random Testing (RT) and Adaptive Random Testing (ART), under different test scenarios. This is the first study in the literature to account for the (more realistic) testing setting in which the tester may not have complete information about the parameters and constraints that pertain to the system, and to account for the challenge posed by faults (in terms of failure rate). Our study was conducted on nine real-world programs under a total of 1683 test scenarios (combinations of available parameter and constraint information and failure rate). The results show significant differences in the techniques' fault detection ability when faults are hard to detect (failure rates are relatively low). CT performs best overall; no worse than any other in 98% of scenarios studied. ART enhances RT, and is comparable to CT in 96% of scenarios, but its computational cost can be up to 3.5 times higher than CT when the program is highly constrained. Additionally, when constraint information is unavailable for a highly-constrained program, a large random test suite is as effective as CT or ART, yet its computational cost of test generation is significantly lower than that of other techniques
Fuzzy Adaptive Tuning of a Particle Swarm Optimization Algorithm for Variable-Strength Combinatorial Test Suite Generation
Combinatorial interaction testing is an important software testing technique
that has seen lots of recent interest. It can reduce the number of test cases
needed by considering interactions between combinations of input parameters.
Empirical evidence shows that it effectively detects faults, in particular, for
highly configurable software systems. In real-world software testing, the input
variables may vary in how strongly they interact, variable strength
combinatorial interaction testing (VS-CIT) can exploit this for higher
effectiveness. The generation of variable strength test suites is a
non-deterministic polynomial-time (NP) hard computational problem
\cite{BestounKamalFuzzy2017}. Research has shown that stochastic
population-based algorithms such as particle swarm optimization (PSO) can be
efficient compared to alternatives for VS-CIT problems. Nevertheless, they
require detailed control for the exploitation and exploration trade-off to
avoid premature convergence (i.e. being trapped in local optima) as well as to
enhance the solution diversity. Here, we present a new variant of PSO based on
Mamdani fuzzy inference system
\cite{Camastra2015,TSAKIRIDIS2017257,KHOSRAVANIAN2016280}, to permit adaptive
selection of its global and local search operations. We detail the design of
this combined algorithm and evaluate it through experiments on multiple
synthetic and benchmark problems. We conclude that fuzzy adaptive selection of
global and local search operations is, at least, feasible as it performs only
second-best to a discrete variant of PSO, called DPSO. Concerning obtaining the
best mean test suite size, the fuzzy adaptation even outperforms DPSO
occasionally. We discuss the reasons behind this performance and outline
relevant areas of future work.Comment: 21 page
Computationally Tractable Algorithms for Finding a Subset of Non-defective Items from a Large Population
In the classical non-adaptive group testing setup, pools of items are tested
together, and the main goal of a recovery algorithm is to identify the
"complete defective set" given the outcomes of different group tests. In
contrast, the main goal of a "non-defective subset recovery" algorithm is to
identify a "subset" of non-defective items given the test outcomes. In this
paper, we present a suite of computationally efficient and analytically
tractable non-defective subset recovery algorithms. By analyzing the
probability of error of the algorithms, we obtain bounds on the number of tests
required for non-defective subset recovery with arbitrarily small probability
of error. Our analysis accounts for the impact of both the additive noise
(false positives) and dilution noise (false negatives). By comparing with the
information theoretic lower bounds, we show that the upper bounds on the number
of tests are order-wise tight up to a factor, where is the number
of defective items. We also provide simulation results that compare the
relative performance of the different algorithms and provide further insights
into their practical utility. The proposed algorithms significantly outperform
the straightforward approaches of testing items one-by-one, and of first
identifying the defective set and then choosing the non-defective items from
the complement set, in terms of the number of measurements required to ensure a
given success rate.Comment: In this revision: Unified some proofs and reorganized the paper,
corrected a small mistake in one of the proofs, added more reference
Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates
The optimization of algorithm (hyper-)parameters is crucial for achieving
peak performance across a wide range of domains, ranging from deep neural
networks to solvers for hard combinatorial problems. The resulting algorithm
configuration (AC) problem has attracted much attention from the machine
learning community. However, the proper evaluation of new AC procedures is
hindered by two key hurdles. First, AC benchmarks are hard to set up. Second
and even more significantly, they are computationally expensive: a single run
of an AC procedure involves many costly runs of the target algorithm whose
performance is to be optimized in a given AC benchmark scenario. One common
workaround is to optimize cheap-to-evaluate artificial benchmark functions
(e.g., Branin) instead of actual algorithms; however, these have different
properties than realistic AC problems. Here, we propose an alternative
benchmarking approach that is similarly cheap to evaluate but much closer to
the original AC problem: replacing expensive benchmarks by surrogate benchmarks
constructed from AC benchmarks. These surrogate benchmarks approximate the
response surface corresponding to true target algorithm performance using a
regression model, and the original and surrogate benchmark share the same
(hyper-)parameter space. In our experiments, we construct and evaluate
surrogate benchmarks for hyperparameter optimization as well as for AC problems
that involve performance optimization of solvers for hard combinatorial
problems, drawing training data from the runs of existing AC procedures. We
show that our surrogate benchmarks capture overall important characteristics of
the AC scenarios, such as high- and low-performing regions, from which they
were derived, while being much easier to use and orders of magnitude cheaper to
evaluate
A controlled migration genetic algorithm operator for hardware-in-the-loop experimentation
In this paper, we describe the development of an extended migration operator, which combats the negative effects of noise on the effective search capabilities of genetic algorithms. The research is motivated by the need to minimize the num- ber of evaluations during hardware-in-the-loop experimentation, which can carry a significant cost penalty in terms of time or financial expense. The authors build on previous research, where convergence for search methods such as Simulated Annealing and Variable Neighbourhood search was accelerated by the implementation of an adaptive decision support operator. This methodology was found to be effective in searching noisy data surfaces. Providing that noise is not too significant, Genetic Al- gorithms can prove even more effective guiding experimentation. It will be shown that with the introduction of a Controlled Migration operator into the GA heuristic, data, which repre- sents a significant signal-to-noise ratio, can be searched with significant beneficial effects on the efficiency of hardware-in-the- loop experimentation, without a priori parameter tuning. The method is tested on an engine-in-the-loop experimental example, and shown to bring significant performance benefits
- …