2,932 research outputs found
Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling
Approximately unbiased tests based on bootstrap probabilities are considered
for the exponential family of distributions with unknown expectation parameter
vector, where the null hypothesis is represented as an arbitrary-shaped region
with smooth boundaries. This problem has been discussed previously in Efron and
Tibshirani [Ann. Statist. 26 (1998) 1687-1718], and a corrected p-value with
second-order asymptotic accuracy is calculated by the two-level bootstrap of
Efron, Halloran and Holmes [Proc. Natl. Acad. Sci. U.S.A. 93 (1996)
13429-13434] based on the ABC bias correction of Efron [J. Amer. Statist.
Assoc. 82 (1987) 171-185]. Our argument is an extension of their asymptotic
theory, where the geometry, such as the signed distance and the curvature of
the boundary, plays an important role. We give another calculation of the
corrected p-value without finding the ``nearest point'' on the boundary to the
observation, which is required in the two-level bootstrap and is an
implementational burden in complicated problems. The key idea is to alter the
sample size of the replicated dataset from that of the observed dataset. The
frequency of the replicates falling in the region is counted for several sample
sizes, and then the p-value is calculated by looking at the change in the
frequencies along the changing sample sizes. This is the multiscale bootstrap
of Shimodaira [Systematic Biology 51 (2002) 492-508], which is third-order
accurate for the multivariate normal model. Here we introduce a newly devised
multistep-multiscale bootstrap, calculating a third-order accurate p-value for
the exponential family of distributions.Comment: Published at http://dx.doi.org/10.1214/009053604000000823 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Selective inference after feature selection via multiscale bootstrap
It is common to show the confidence intervals or -values of selected
features, or predictor variables in regression, but they often involve
selection bias. The selective inference approach solves this bias by
conditioning on the selection event. Most existing studies of selective
inference consider a specific algorithm, such as Lasso, for feature selection,
and thus they have difficulties in handling more complicated algorithms.
Moreover, existing studies often consider unnecessarily restrictive events,
leading to over-conditioning and lower statistical power. Our novel and
widely-applicable resampling method addresses these issues to compute an
approximately unbiased selective -value for the selected features. We prove
that the -value computed by our resampling method is more accurate and more
powerful than existing methods, while the computational cost is the same order
as the classical bootstrap method. Numerical experiments demonstrate that our
algorithm works well even for more complicated feature selection methods such
as non-convex regularization.Comment: The title has changed (The previous title is "Selective inference
after variable selection via multiscale bootstrap"). 23 pages, 11 figure
An information criterion for auxiliary variable selection in incomplete data analysis
Statistical inference is considered for variables of interest, called primary
variables, when auxiliary variables are observed along with the primary
variables. We consider the setting of incomplete data analysis, where some
primary variables are not observed. Utilizing a parametric model of joint
distribution of primary and auxiliary variables, it is possible to improve the
estimation of parametric model for the primary variables when the auxiliary
variables are closely related to the primary variables. However, the estimation
accuracy reduces when the auxiliary variables are irrelevant to the primary
variables. For selecting useful auxiliary variables, we formulate the problem
as model selection, and propose an information criterion for predicting primary
variables by leveraging auxiliary variables. The proposed information criterion
is an asymptotically unbiased estimator of the Kullback-Leibler divergence for
complete data of primary variables under some reasonable conditions. We also
clarify an asymptotic equivalence between the proposed information criterion
and a variant of leave-one-out cross validation. Performance of our method is
demonstrated via a simulation study and a real data example
Frequentist and Bayesian measures of confidence via multiscale bootstrap for testing three regions
A new computation method of frequentist -values and Bayesian posterior
probabilities based on the bootstrap probability is discussed for the
multivariate normal model with unknown expectation parameter vector. The null
hypothesis is represented as an arbitrary-shaped region. We introduce new
parametric models for the scaling-law of bootstrap probability so that the
multiscale bootstrap method, which was designed for one-sided test, can also
computes confidence measures of two-sided test, extending applicability to a
wider class of hypotheses. Parameter estimation is improved by the two-step
multiscale bootstrap and also by including higher-order terms. Model selection
is important not only as a motivating application of our method, but also as an
essential ingredient in the method. A compromise between frequentist and
Bayesian is attempted by showing that the Bayesian posterior probability with
an noninformative prior is interpreted as a frequentist -value of
``zero-sided'' test
- …