87 research outputs found

    Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling

    Full text link
    Approximately unbiased tests based on bootstrap probabilities are considered for the exponential family of distributions with unknown expectation parameter vector, where the null hypothesis is represented as an arbitrary-shaped region with smooth boundaries. This problem has been discussed previously in Efron and Tibshirani [Ann. Statist. 26 (1998) 1687-1718], and a corrected p-value with second-order asymptotic accuracy is calculated by the two-level bootstrap of Efron, Halloran and Holmes [Proc. Natl. Acad. Sci. U.S.A. 93 (1996) 13429-13434] based on the ABC bias correction of Efron [J. Amer. Statist. Assoc. 82 (1987) 171-185]. Our argument is an extension of their asymptotic theory, where the geometry, such as the signed distance and the curvature of the boundary, plays an important role. We give another calculation of the corrected p-value without finding the ``nearest point'' on the boundary to the observation, which is required in the two-level bootstrap and is an implementational burden in complicated problems. The key idea is to alter the sample size of the replicated dataset from that of the observed dataset. The frequency of the replicates falling in the region is counted for several sample sizes, and then the p-value is calculated by looking at the change in the frequencies along the changing sample sizes. This is the multiscale bootstrap of Shimodaira [Systematic Biology 51 (2002) 492-508], which is third-order accurate for the multivariate normal model. Here we introduce a newly devised multistep-multiscale bootstrap, calculating a third-order accurate p-value for the exponential family of distributions.Comment: Published at http://dx.doi.org/10.1214/009053604000000823 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Selective inference after feature selection via multiscale bootstrap

    Full text link
    It is common to show the confidence intervals or pp-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely-applicable resampling method addresses these issues to compute an approximately unbiased selective pp-value for the selected features. We prove that the pp-value computed by our resampling method is more accurate and more powerful than existing methods, while the computational cost is the same order as the classical bootstrap method. Numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.Comment: The title has changed (The previous title is "Selective inference after variable selection via multiscale bootstrap"). 23 pages, 11 figure
    • …
    corecore