139,685 research outputs found

    PopArt: Ranked Testing Efficiency

    Get PDF
    Too often, programmers are under pressure to maximize their confidence in the correctness of their code with a tight testing budget. Should they spend some of that budget on finding “interesting” inputs or spend their entire testing budget on test executions? Work on testing efficiency has explored two competing approaches to answer this question: systematic partition testing (ST), which defines a testing partition and tests its parts, and random testing (RT), which directly samples inputs with replacement. A consensus as to which is better when has yet to emerge. We present Probability Ordered Partition Testing (POPART), a new systematic partition-based testing strategy that visits the parts of a testing partition in decreasing probability order and in doing so leverages any non-uniformity over that partition. We show how to construct a homogeneous testing partition, a requirement for systematic testing, by using an executable oracle and the path partition. A program’s path partition is a naturally occurring testing partition that is usually skewed for the simple reason that some paths execute more frequently than others. To confirm this conventional wisdom, we instrument programs from the Codeflaws repository and find that 80% of them have a skewed path probability distribution. POPART visits the parts of a testing partition in decreasing probability order. We then compare POPART with RT to characterise the configuration space in which each is more efficient. We show that, when simulating Codeflaws, POPART outperforms RT after 100;000 executions. Our results reaffirm RT’s power for very small testing budgets but also show that for any application requiring high (above 90%) probability-weighted coverage POPART should be preferred. In such cases, despite paying more for each test execution, we prove that POPART outperforms RT: it traverses parts whose cumulative probability bounds that of random testing, showing that sampling without replacement pays for itself, given a nonuniform probability over a testing partition

    Partition Information and its Transmission over Boolean Multi-Access Channels

    Full text link
    In this paper, we propose a novel partition reservation system to study the partition information and its transmission over a noise-free Boolean multi-access channel. The objective of transmission is not message restoration, but to partition active users into distinct groups so that they can, subsequently, transmit their messages without collision. We first calculate (by mutual information) the amount of information needed for the partitioning without channel effects, and then propose two different coding schemes to obtain achievable transmission rates over the channel. The first one is the brute force method, where the codebook design is based on centralized source coding; the second method uses random coding where the codebook is generated randomly and optimal Bayesian decoding is employed to reconstruct the partition. Both methods shed light on the internal structure of the partition problem. A novel hypergraph formulation is proposed for the random coding scheme, which intuitively describes the information in terms of a strong coloring of a hypergraph induced by a sequence of channel operations and interactions between active users. An extended Fibonacci structure is found for a simple, but non-trivial, case with two active users. A comparison between these methods and group testing is conducted to demonstrate the uniqueness of our problem.Comment: Submitted to IEEE Transactions on Information Theory, major revisio

    Proportional sampling strategy: A compendium and some insights

    Get PDF
    There have been numerous studies on the effectiveness of partition and random testing. In particular, the proportional sampling (PS) strategy has been proved, under certain conditions, to be the only form of partition testing that outperforms random testing regardless of where the failure-causing inputs are. This paper provides an integrated synthesis and overview of our recent studies on the PS strategy and its related work. Through this synthesis, we offer a perspective that properly interprets the results obtained so far, and present some of the interesting issues involved and new insights obtained during the course of this research. © 2001 Elsevier Science Inc. All rights reserved.postprin

    Consistent distribution-free KK-sample and independence tests for univariate random variables

    Full text link
    A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example.Comment: arXiv admin note: substantial text overlap with arXiv:1308.155

    Test case selection with and without replacement

    Get PDF
    Previous theoretical studies on the effectiveness of partition testing and random testing have assumed that test cases are selected with replacement. Although this assumption has been well known to be less realistic, it has still been used in previous theoretical work because it renders the analyses more tractable. This paper presents a theoretical investigation aimed at comparing the effectiveness when test cases are selected with and without replacement, and exploring the relationships between these two scenarios. We propose a new effectiveness metric for software testing, namely the expected number of distinct failures detected, to re-examine existing partition testing strategies.postprin

    Asymptotic Error Free Partitioning over Noisy Boolean Multiaccess Channels

    Full text link
    In this paper, we consider the problem of partitioning active users in a manner that facilitates multi-access without collision. The setting is of a noisy, synchronous, Boolean, multi-access channel where KK active users (out of a total of NN users) seek to access. A solution to the partition problem places each of the NN users in one of KK groups (or blocks) such that no two active nodes are in the same block. We consider a simple, but non-trivial and illustrative case of K=2K=2 active users and study the number of steps TT used to solve the partition problem. By random coding and a suboptimal decoding scheme, we show that for any T(C1+ξ1)logNT\geq (C_1 +\xi_1)\log N, where C1C_1 and ξ1\xi_1 are positive constants (independent of NN), and ξ1\xi_1 can be arbitrary small, the partition problem can be solved with error probability Pe(N)0P_e^{(N)} \to 0, for large NN. Under the same scheme, we also bound TT from the other direction, establishing that, for any T(C2ξ2)logNT \leq (C_2 - \xi_2) \log N, the error probability Pe(N)1P_e^{(N)} \to 1 for large NN; again C2C_2 and ξ2\xi_2 are constants and ξ2\xi_2 can be arbitrarily small. These bounds on the number of steps are lower than the tight achievable lower-bound in terms of T(Cg+ξ)logNT \geq (C_g +\xi)\log N for group testing (in which all active users are identified, rather than just partitioned). Thus, partitioning may prove to be a more efficient approach for multi-access than group testing.Comment: This paper was submitted in June 2014 to IEEE Transactions on Information Theory, and is under review no

    Finding and testing network communities by lumped Markov chains

    Get PDF
    Identifying communities (or clusters), namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. Yet, there is a lack of formal criteria for defining communities and for testing their significance. We propose a sharp definition which is based on a significance threshold. By means of a lumped Markov chain model of a random walker, a quality measure called "persistence probability" is associated to a cluster. Then the cluster is defined as an "α\alpha-community" if such a probability is not smaller than α\alpha. Consistently, a partition composed of α\alpha-communities is an "α\alpha-partition". These definitions turn out to be very effective for finding and testing communities. If a set of candidate partitions is available, setting the desired α\alpha-level allows one to immediately select the α\alpha-partition with the finest decomposition. Simultaneously, the persistence probabilities quantify the significance of each single community. Given its ability in individually assessing the quality of each cluster, this approach can also disclose single well-defined communities even in networks which overall do not possess a definite clusterized structure

    Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach

    Get PDF
    In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
    corecore