117,598 research outputs found
Classifier selection with permutation tests
This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft
Improved Error Bounds Based on Worst Likely Assignments
Error bounds based on worst likely assignments use permutation tests to
validate classifiers. Worst likely assignments can produce effective bounds
even for data sets with 100 or fewer training examples. This paper introduces a
statistic for use in the permutation tests of worst likely assignments that
improves error bounds, especially for accurate classifiers, which are typically
the classifiers of interest.Comment: IJCNN 201
Direction-Projection-Permutation for High Dimensional Hypothesis Tests
Motivated by the prevalence of high dimensional low sample size datasets in
modern statistical applications, we propose a general nonparametric framework,
Direction-Projection-Permutation (DiProPerm), for testing high dimensional
hypotheses. The method is aimed at rigorous testing of whether lower
dimensional visual differences are statistically significant. Theoretical
analysis under the non-classical asymptotic regime of dimension going to
infinity for fixed sample size reveals that certain natural variations of
DiProPerm can have very different behaviors. An empirical power study both
confirms the theoretical results and suggests DiProPerm is a powerful test in
many settings. Finally DiProPerm is applied to a high dimensional gene
expression dataset
Exact and asymptotically robust permutation tests
Given independent samples from P and Q, two-sample permutation tests allow
one to construct exact level tests when the null hypothesis is P=Q. On the
other hand, when comparing or testing particular parameters of P and
Q, such as their means or medians, permutation tests need not be level
, or even approximately level in large samples. Under very
weak assumptions for comparing estimators, we provide a general test procedure
whereby the asymptotic validity of the permutation test holds while retaining
the exact rejection probability in finite samples when the underlying
distributions are identical. The ideas are broadly applicable and special
attention is given to the k-sample problem of comparing general parameters,
whereby a permutation test is constructed which is exact level under
the hypothesis of identical distributions, but has asymptotic rejection
probability under the more general null hypothesis of equality of
parameters. A Monte Carlo simulation study is performed as well. A quite
general theory is possible based on a coupling construction, as well as a key
contiguity argument for the multinomial and multivariate hypergeometric
distributions.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1090 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …
