780 research outputs found
Tests of the Kolmogorov-Smirnov type for exponential data with unknown scale, and related problems
SUMMARY Let D^, D^+ and D^â denote Kolmogorov-Smirnov typo one-sample statistics to test good ness of fit in the presence of unknown nuisance parameters; then the distributions of D^, D^+ and D^â depend on the population sampled and the estimator used. Simulation has been the primary tool for studying these statistics. Recently, Durbin obtained the distributions of D^, D^+ and D^â in terms of a Fourier transform for a wide class of underlying populations, and produced explicit results for the exponential case. In this paper, the distribution functions of D^, D^+ and D^â for the exponential case are derived from general results for order statistics, and computationally efficient approximations to these distribution functions are obtained. In the course of this derivation, Bonferroni inequalities of Kounias, and Sobel & Uppuluri are generalized. Certain problems of goodness-of-fit testing in the presence of nuisance parameters, whose solutions make use of existing tables, are also discussed. These problems include the Pareto, Rayleigh, power function, and uniform distribution
Faster Family-wise Error Control for Neuroimaging with a Parametric Bootstrap
In neuroimaging, hundreds to hundreds of thousands of tests are performed
across a set of brain regions or all locations in an image. Recent studies have
shown that the most common family-wise error (FWE) controlling procedures in
imaging, which rely on classical mathematical inequalities or Gaussian random
field theory, yield FWE rates that are far from the nominal level. Depending on
the approach used, the FWER can be exceedingly small or grossly inflated. Given
the widespread use of neuroimaging as a tool for understanding neurological and
psychiatric disorders, it is imperative that reliable multiple testing
procedures are available. To our knowledge, only permutation joint testing
procedures have been shown to reliably control the FWER at the nominal level.
However, these procedures are computationally intensive due to the increasingly
available large sample sizes and dimensionality of the images, and analyses can
take days to complete. Here, we develop a parametric bootstrap joint testing
procedure. The parametric bootstrap procedure works directly with the test
statistics, which leads to much faster estimation of adjusted \emph{p}-values
than resampling-based procedures while reliably controlling the FWER in sample
sizes available in many neuroimaging studies. We demonstrate that the procedure
controls the FWER in finite samples using simulations, and present region- and
voxel-wise analyses to test for sex differences in developmental trajectories
of cerebral blood flow
Asymptotic Bayes-optimality under sparsity of some multiple testing procedures
Within a Bayesian decision theoretic framework we investigate some asymptotic
optimality properties of a large class of multiple testing rules. A parametric
setup is considered, in which observations come from a normal scale mixture
model and the total loss is assumed to be the sum of losses for individual
tests. Our model can be used for testing point null hypotheses, as well as to
distinguish large signals from a multitude of very small effects. A rule is
defined to be asymptotically Bayes optimal under sparsity (ABOS), if within our
chosen asymptotic framework the ratio of its Bayes risk and that of the Bayes
oracle (a rule which minimizes the Bayes risk) converges to one. Our main
interest is in the asymptotic scheme where the proportion p of "true"
alternatives converges to zero. We fully characterize the class of fixed
threshold multiple testing rules which are ABOS, and hence derive conditions
for the asymptotic optimality of rules controlling the Bayesian False Discovery
Rate (BFDR). We finally provide conditions under which the popular
Benjamini-Hochberg (BH) and Bonferroni procedures are ABOS and show that for a
wide class of sparsity levels, the threshold of the former can be approximated
by a nonrandom threshold.Comment: Published in at http://dx.doi.org/10.1214/10-AOS869 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation
We study a problem of quick detection of top-k Personalized PageRank lists.
This problem has a number of important applications such as finding local cuts
in large graphs, estimation of similarity distance and name disambiguation. In
particular, we apply our results to construct efficient algorithms for the
person name disambiguation problem. We argue that when finding top-k
Personalized PageRank lists two observations are important. Firstly, it is
crucial that we detect fast the top-k most important neighbours of a node,
while the exact order in the top-k list as well as the exact values of PageRank
are by far not so crucial. Secondly, a little number of wrong elements in top-k
lists do not really degrade the quality of top-k lists, but it can lead to
significant computational saving. Based on these two key observations we
propose Monte Carlo methods for fast detection of top-k Personalized PageRank
lists. We provide performance evaluation of the proposed methods and supply
stopping criteria. Then, we apply the methods to the person name disambiguation
problem. The developed algorithm for the person name disambiguation problem has
achieved the second place in the WePS 2010 competition
Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests
We study generalized bootstrap confidence regions for the mean of a random
vector whose coordinates have an unknown dependency structure. The random
vector is supposed to be either Gaussian or to have a symmetric and bounded
distribution. The dimensionality of the vector can possibly be much larger than
the number of observations and we focus on a nonasymptotic control of the
confidence level, following ideas inspired by recent results in learning
theory. We consider two approaches, the first based on a concentration
principle (valid for a large class of resampling weights) and the second on a
resampled quantile, specifically using Rademacher weights. Several intermediate
results established in the approach based on concentration principles are of
interest in their own right. We also discuss the question of accuracy when
using Monte Carlo approximations of the resampled quantities.Comment: Published in at http://dx.doi.org/10.1214/08-AOS667;
http://dx.doi.org/10.1214/08-AOS668 the Annals of Statistics
(http://www.imstat.org/aos/) by the Institute of Mathematical Statistics
(http://www.imstat.org
- âŠ