8,470 research outputs found
Resampling-based confidence regions and multiple tests for a correlated random vector
We derive non-asymptotic confidence regions for the mean of a random vector
whose coordinates have an unknown dependence structure. The random vector is
supposed to be either Gaussian or to have a symmetric bounded distribution, and
we observe i.i.d copies of it. The confidence regions are built using a
data-dependent threshold based on a weighted bootstrap procedure. We consider
two approaches, the first based on a concentration approach and the second on a
direct boostrapped quantile approach. The first one allows to deal with a very
large class of resampling weights while our results for the second are
restricted to Rademacher weights. However, the second method seems more
accurate in practice. Our results are motivated by multiple testing problems,
and we show on simulations that our procedures are better than the Bonferroni
procedure (union bound) as soon as the observed vector has sufficiently
correlated coordinates.Comment: submitted to COL
Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests
We study generalized bootstrap confidence regions for the mean of a random
vector whose coordinates have an unknown dependency structure. The random
vector is supposed to be either Gaussian or to have a symmetric and bounded
distribution. The dimensionality of the vector can possibly be much larger than
the number of observations and we focus on a nonasymptotic control of the
confidence level, following ideas inspired by recent results in learning
theory. We consider two approaches, the first based on a concentration
principle (valid for a large class of resampling weights) and the second on a
resampled quantile, specifically using Rademacher weights. Several intermediate
results established in the approach based on concentration principles are of
interest in their own right. We also discuss the question of accuracy when
using Monte Carlo approximations of the resampled quantities.Comment: Published in at http://dx.doi.org/10.1214/08-AOS667;
http://dx.doi.org/10.1214/08-AOS668 the Annals of Statistics
(http://www.imstat.org/aos/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Small sample sizes : A big data problem in high-dimensional data analysis
Acknowledgements The authors are grateful to the Editor, Associate Editor and three anonymous referees for their helpful suggestions, which greatly improved the manuscript. Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by the German Science Foundation awards number DFG KO 4680/3-2 and PA 2409/3-2.Peer reviewedPublisher PD
Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM
Permutation testing is a non-parametric method for obtaining the max null
distribution used to compute corrected -values that provide strong control
of false positives. In neuroimaging, however, the computational burden of
running such an algorithm can be significant. We find that by viewing the
permutation testing procedure as the construction of a very large permutation
testing matrix, , one can exploit structural properties derived from the
data and the test statistics to reduce the runtime under certain conditions. In
particular, we see that is low-rank plus a low-variance residual. This
makes a good candidate for low-rank matrix completion, where only a very
small number of entries of ( of all entries in our experiments)
have to be computed to obtain a good estimate. Based on this observation, we
present RapidPT, an algorithm that efficiently recovers the max null
distribution commonly obtained through regular permutation testing in
voxel-wise analysis. We present an extensive validation on a synthetic dataset
and four varying sized datasets against two baselines: Statistical
NonParametric Mapping (SnPM13) and a standard permutation testing
implementation (referred as NaivePT). We find that RapidPT achieves its best
runtime performance on medium sized datasets (), with
speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger
datasets () RapidPT outperforms NaivePT (6x - 200x) on all
datasets, and provides large speedups over SnPM13 when more than 10000
permutations (2x - 15x) are needed. The implementation is a standalone toolbox
and also integrated within SnPM13, able to leverage multi-core architectures
when available.Comment: 36 pages, 16 figure
- …