87 research outputs found
Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling
Approximately unbiased tests based on bootstrap probabilities are considered
for the exponential family of distributions with unknown expectation parameter
vector, where the null hypothesis is represented as an arbitrary-shaped region
with smooth boundaries. This problem has been discussed previously in Efron and
Tibshirani [Ann. Statist. 26 (1998) 1687-1718], and a corrected p-value with
second-order asymptotic accuracy is calculated by the two-level bootstrap of
Efron, Halloran and Holmes [Proc. Natl. Acad. Sci. U.S.A. 93 (1996)
13429-13434] based on the ABC bias correction of Efron [J. Amer. Statist.
Assoc. 82 (1987) 171-185]. Our argument is an extension of their asymptotic
theory, where the geometry, such as the signed distance and the curvature of
the boundary, plays an important role. We give another calculation of the
corrected p-value without finding the ``nearest point'' on the boundary to the
observation, which is required in the two-level bootstrap and is an
implementational burden in complicated problems. The key idea is to alter the
sample size of the replicated dataset from that of the observed dataset. The
frequency of the replicates falling in the region is counted for several sample
sizes, and then the p-value is calculated by looking at the change in the
frequencies along the changing sample sizes. This is the multiscale bootstrap
of Shimodaira [Systematic Biology 51 (2002) 492-508], which is third-order
accurate for the multivariate normal model. Here we introduce a newly devised
multistep-multiscale bootstrap, calculating a third-order accurate p-value for
the exponential family of distributions.Comment: Published at http://dx.doi.org/10.1214/009053604000000823 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Selective inference after feature selection via multiscale bootstrap
It is common to show the confidence intervals or -values of selected
features, or predictor variables in regression, but they often involve
selection bias. The selective inference approach solves this bias by
conditioning on the selection event. Most existing studies of selective
inference consider a specific algorithm, such as Lasso, for feature selection,
and thus they have difficulties in handling more complicated algorithms.
Moreover, existing studies often consider unnecessarily restrictive events,
leading to over-conditioning and lower statistical power. Our novel and
widely-applicable resampling method addresses these issues to compute an
approximately unbiased selective -value for the selected features. We prove
that the -value computed by our resampling method is more accurate and more
powerful than existing methods, while the computational cost is the same order
as the classical bootstrap method. Numerical experiments demonstrate that our
algorithm works well even for more complicated feature selection methods such
as non-convex regularization.Comment: The title has changed (The previous title is "Selective inference
after variable selection via multiscale bootstrap"). 23 pages, 11 figure
- …