8,470 research outputs found

    Resampling-based confidence regions and multiple tests for a correlated random vector

    Get PDF
    We derive non-asymptotic confidence regions for the mean of a random vector whose coordinates have an unknown dependence structure. The random vector is supposed to be either Gaussian or to have a symmetric bounded distribution, and we observe nn i.i.d copies of it. The confidence regions are built using a data-dependent threshold based on a weighted bootstrap procedure. We consider two approaches, the first based on a concentration approach and the second on a direct boostrapped quantile approach. The first one allows to deal with a very large class of resampling weights while our results for the second are restricted to Rademacher weights. However, the second method seems more accurate in practice. Our results are motivated by multiple testing problems, and we show on simulations that our procedures are better than the Bonferroni procedure (union bound) as soon as the observed vector has sufficiently correlated coordinates.Comment: submitted to COL

    Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests

    Get PDF
    We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities.Comment: Published in at http://dx.doi.org/10.1214/08-AOS667; http://dx.doi.org/10.1214/08-AOS668 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Small sample sizes : A big data problem in high-dimensional data analysis

    Get PDF
    Acknowledgements The authors are grateful to the Editor, Associate Editor and three anonymous referees for their helpful suggestions, which greatly improved the manuscript. Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by the German Science Foundation awards number DFG KO 4680/3-2 and PA 2409/3-2.Peer reviewedPublisher PD

    Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

    Get PDF
    Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected pp-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, TT, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that TT is low-rank plus a low-variance residual. This makes TT a good candidate for low-rank matrix completion, where only a very small number of entries of TT (∼0.35%\sim0.35\% of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets (50≤n≤20050 \leq n \leq 200), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets (n≥200n \geq 200) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.Comment: 36 pages, 16 figure
    • …
    corecore