2,577 research outputs found
Breaking the curse of dimensionality in regression
Models with many signals, high-dimensional models, often impose structures on
the signal strengths. The common assumption is that only a few signals are
strong and most of the signals are zero or close (collectively) to zero.
However, such a requirement might not be valid in many real-life applications.
In this article, we are interested in conducting large-scale inference in
models that might have signals of mixed strengths. The key challenge is that
the signals that are not under testing might be collectively non-negligible
(although individually small) and cannot be accurately learned. This article
develops a new class of tests that arise from a moment matching formulation. A
virtue of these moment-matching statistics is their ability to borrow strength
across features, adapt to the sparsity size and exert adjustment for testing
growing number of hypothesis. GRoup-level Inference of Parameter, GRIP, test
harvests effective sparsity structures with hypothesis formulation for an
efficient multiple testing procedure. Simulated data showcase that GRIPs error
control is far better than the alternative methods. We develop a minimax
theory, demonstrating optimality of GRIP for a broad range of models, including
those where the model is a mixture of a sparse and high-dimensional dense
signals.Comment: 51 page
Adaptive nonparametric confidence sets
We construct honest confidence regions for a Hilbert space-valued parameter
in various statistical models. The confidence sets can be centered at arbitrary
adaptive estimators, and have diameter which adapts optimally to a given
selection of models. The latter adaptation is necessarily limited in scope. We
review the notion of adaptive confidence regions, and relate the optimal rates
of the diameter of adaptive confidence regions to the minimax rates for testing
and estimation. Applications include the finite normal mean model, the white
noise model, density estimation and regression with random design.Comment: Published at http://dx.doi.org/10.1214/009053605000000877 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A note on an Adaptive Goodness-of-Fit test with Finite Sample Validity for Random Design Regression Models
Given an i.i.d. sample from the random
design regression model with , in this paper we consider the problem of testing the (simple) null
hypothesis , against the alternative for a fixed , where denotes the marginal distribution of the
design variable . The procedure proposed is an adaptation to the regression
setting of a multiple testing technique introduced by Fromont and Laurent
(2005), and it amounts to consider a suitable collection of unbiased estimators
of the --distance ,
rejecting the null hypothesis when at least one of them is greater than its
quantile, with calibrated to obtain a level--
test. To build these estimators, we will use the warped wavelet basis
introduced by Picard and Kerkyacharian (2004). We do not assume that the errors
are normally distributed, and we do not assume that and are
independent but, mainly for technical reasons, we will assume, as in most part
of the current literature in learning theory, that is uniformly
bounded (almost everywhere). We show that our test is adaptive over a
particular collection of approximation spaces linked to the classical Besov
spaces
Optimal Calibration for Multiple Testing against Local Inhomogeneity in Higher Dimension
Based on two independent samples X_1,...,X_m and X_{m+1},...,X_n drawn from
multivariate distributions with unknown Lebesgue densities p and q
respectively, we propose an exact multiple test in order to identify
simultaneously regions of significant deviations between p and q. The
construction is built from randomized nearest-neighbor statistics. It does not
require any preliminary information about the multivariate densities such as
compact support, strict positivity or smoothness and shape properties. The
properly adjusted multiple testing procedure is shown to be sharp-optimal for
typical arrangements of the observation values which appear with probability
close to one. The proof relies on a new coupling Bernstein type exponential
inequality, reflecting the non-subgaussian tail behavior of a combinatorial
process. For power investigation of the proposed method a reparametrized
minimax set-up is introduced, reducing the composite hypothesis "p=q" to a
simple one with the multivariate mixed density (m/n)p+(1-m/n)q as infinite
dimensional nuisance parameter. Within this framework, the test is shown to be
spatially and sharply asymptotically adaptive with respect to uniform loss on
isotropic H\"older classes. The exact minimax risk asymptotics are obtained in
terms of solutions of the optimal recovery
Detection of an anomalous cluster in a network
We consider the problem of detecting whether or not, in a given sensor
network, there is a cluster of sensors which exhibit an "unusual behavior."
Formally, suppose we are given a set of nodes and attach a random variable to
each node. We observe a realization of this process and want to decide between
the following two hypotheses: under the null, the variables are i.i.d. standard
normal; under the alternative, there is a cluster of variables that are i.i.d.
normal with positive mean and unit variance, while the rest are i.i.d. standard
normal. We also address surveillance settings where each sensor in the network
collects information over time. The resulting model is similar, now with a time
series attached to each node. We again observe the process over time and want
to decide between the null, where all the variables are i.i.d. standard normal,
and the alternative, where there is an emerging cluster of i.i.d. normal
variables with positive mean and unit variance. The growth models used to
represent the emerging cluster are quite general and, in particular, include
cellular automata used in modeling epidemics. In both settings, we consider
classes of clusters that are quite general, for which we obtain a lower bound
on their respective minimax detection rate and show that some form of scan
statistic, by far the most popular method in practice, achieves that same rate
to within a logarithmic factor. Our results are not limited to the normal
location model, but generalize to any one-parameter exponential family when the
anomalous clusters are large enough.Comment: Published in at http://dx.doi.org/10.1214/10-AOS839 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Parameter tuning in pointwise adaptation using a propagation approach
This paper discusses the problem of adaptive estimation of a univariate
object like the value of a regression function at a given point or a linear
functional in a linear inverse problem. We consider an adaptive procedure
originated from Lepski [Theory Probab. Appl. 35 (1990) 454--466.] that selects
in a data-driven way one estimate out of a given class of estimates ordered by
their variability. A serious problem with using this and similar procedures is
the choice of some tuning parameters like thresholds. Numerical results show
that the theoretically recommended proposals appear to be too conservative and
lead to a strong oversmoothing effect. A careful choice of the parameters of
the procedure is extremely important for getting the reasonable quality of
estimation. The main contribution of this paper is the new approach for
choosing the parameters of the procedure by providing the prescribed behavior
of the resulting estimate in the simple parametric situation. We establish a
non-asymptotical "oracle" bound, which shows that the estimation risk is, up to
a logarithmic multiplier, equal to the risk of the "oracle" estimate that is
optimally selected from the given family. A numerical study demonstrates a good
performance of the resulting procedure in a number of simulated examples.Comment: Published in at http://dx.doi.org/10.1214/08-AOS607 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …