117,598 research outputs found

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    Improved Error Bounds Based on Worst Likely Assignments

    Full text link
    Error bounds based on worst likely assignments use permutation tests to validate classifiers. Worst likely assignments can produce effective bounds even for data sets with 100 or fewer training examples. This paper introduces a statistic for use in the permutation tests of worst likely assignments that improves error bounds, especially for accurate classifiers, which are typically the classifiers of interest.Comment: IJCNN 201

    Direction-Projection-Permutation for High Dimensional Hypothesis Tests

    Full text link
    Motivated by the prevalence of high dimensional low sample size datasets in modern statistical applications, we propose a general nonparametric framework, Direction-Projection-Permutation (DiProPerm), for testing high dimensional hypotheses. The method is aimed at rigorous testing of whether lower dimensional visual differences are statistically significant. Theoretical analysis under the non-classical asymptotic regime of dimension going to infinity for fixed sample size reveals that certain natural variations of DiProPerm can have very different behaviors. An empirical power study both confirms the theoretical results and suggests DiProPerm is a powerful test in many settings. Finally DiProPerm is applied to a high dimensional gene expression dataset

    Exact and asymptotically robust permutation tests

    Full text link
    Given independent samples from P and Q, two-sample permutation tests allow one to construct exact level tests when the null hypothesis is P=Q. On the other hand, when comparing or testing particular parameters θ\theta of P and Q, such as their means or medians, permutation tests need not be level α\alpha, or even approximately level α\alpha in large samples. Under very weak assumptions for comparing estimators, we provide a general test procedure whereby the asymptotic validity of the permutation test holds while retaining the exact rejection probability α\alpha in finite samples when the underlying distributions are identical. The ideas are broadly applicable and special attention is given to the k-sample problem of comparing general parameters, whereby a permutation test is constructed which is exact level α\alpha under the hypothesis of identical distributions, but has asymptotic rejection probability α\alpha under the more general null hypothesis of equality of parameters. A Monte Carlo simulation study is performed as well. A quite general theory is possible based on a coupling construction, as well as a key contiguity argument for the multinomial and multivariate hypergeometric distributions.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1090 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore