1,216 research outputs found
Minimax and Adaptive Inference in Nonparametric Function Estimation
Since Stein's 1956 seminal paper, shrinkage has played a fundamental role in
both parametric and nonparametric inference. This article discusses minimaxity
and adaptive minimaxity in nonparametric function estimation. Three
interrelated problems, function estimation under global integrated squared
error, estimation under pointwise squared error, and nonparametric confidence
intervals, are considered. Shrinkage is pivotal in the development of both the
minimax theory and the adaptation theory. While the three problems are closely
connected and the minimax theories bear some similarities, the adaptation
theories are strikingly different. For example, in a sharp contrast to adaptive
point estimation, in many common settings there do not exist nonparametric
confidence intervals that adapt to the unknown smoothness of the underlying
function. A concise account of these theories is given. The connections as well
as differences among these problems are discussed and illustrated through
examples.Comment: Published in at http://dx.doi.org/10.1214/11-STS355 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Stochastic expansions using continuous dictionaries: L\'{e}vy adaptive regression kernels
This article describes a new class of prior distributions for nonparametric
function estimation. The unknown function is modeled as a limit of weighted
sums of kernels or generator functions indexed by continuous parameters that
control local and global features such as their translation, dilation,
modulation and shape. L\'{e}vy random fields and their stochastic integrals are
employed to induce prior distributions for the unknown functions or,
equivalently, for the number of kernels and for the parameters governing their
features. Scaling, shape, and other features of the generating functions are
location-specific to allow quite different function properties in different
parts of the space, as with wavelet bases and other methods employing
overcomplete dictionaries. We provide conditions under which the stochastic
expansions converge in specified Besov or Sobolev norms. Under a Gaussian error
model, this may be viewed as a sparse regression problem, with regularization
induced via the L\'{e}vy random field prior distribution. Posterior inference
for the unknown functions is based on a reversible jump Markov chain Monte
Carlo algorithm. We compare the L\'{e}vy Adaptive Regression Kernel (LARK)
method to wavelet-based methods using some of the standard test functions, and
illustrate its flexibility and adaptability in nonstationary applications.Comment: Published in at http://dx.doi.org/10.1214/11-AOS889 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the Bernstein-von Mises phenomenon for nonparametric Bayes procedures
We continue the investigation of Bernstein-von Mises theorems for
nonparametric Bayes procedures from [Ann. Statist. 41 (2013) 1999-2028]. We
introduce multiscale spaces on which nonparametric priors and posteriors are
naturally defined, and prove Bernstein-von Mises theorems for a variety of
priors in the setting of Gaussian nonparametric regression and in the i.i.d.
sampling model. From these results we deduce several applications where
posterior-based inference coincides with efficient frequentist procedures,
including Donsker- and Kolmogorov-Smirnov theorems for the random posterior
cumulative distribution functions. We also show that multiscale posterior
credible bands for the regression or density function are optimal frequentist
confidence bands.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1246 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
General maximum likelihood empirical Bayes estimation of normal means
We propose a general maximum likelihood empirical Bayes (GMLEB) method for
the estimation of a mean vector based on observations with i.i.d. normal
errors. We prove that under mild moment conditions on the unknown means, the
average mean squared error (MSE) of the GMLEB is within an infinitesimal
fraction of the minimum average MSE among all separable estimators which use a
single deterministic estimating function on individual observations, provided
that the risk is of greater order than . We also prove that the
GMLEB is uniformly approximately minimax in regular and weak balls
when the order of the length-normalized norm of the unknown means is between
and . Simulation
experiments demonstrate that the GMLEB outperforms the James--Stein and several
state-of-the-art threshold estimators in a wide range of settings without much
down side.Comment: Published in at http://dx.doi.org/10.1214/08-AOS638 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Component Selection in the Additive Regression Model
Similar to variable selection in the linear regression model, selecting
significant components in the popular additive regression model is of great
interest. However, such components are unknown smooth functions of independent
variables, which are unobservable. As such, some approximation is needed. In
this paper, we suggest a combination of penalized regression spline
approximation and group variable selection, called the lasso-type spline method
(LSM), to handle this component selection problem with a diverging number of
strongly correlated variables in each group. It is shown that the proposed
method can select significant components and estimate nonparametric additive
function components simultaneously with an optimal convergence rate
simultaneously. To make the LSM stable in computation and able to adapt its
estimators to the level of smoothness of the component functions, weighted
power spline bases and projected weighted power spline bases are proposed.
Their performance is examined by simulation studies across two set-ups with
independent predictors and correlated predictors, respectively, and appears
superior to the performance of competing methods. The proposed method is
extended to a partial linear regression model analysis with real data, and
gives reliable results
Wavelet methods in statistics: Some recent developments and their applications
The development of wavelet theory has in recent years spawned applications in
signal processing, in fast algorithms for integral transforms, and in image and
function representation methods. This last application has stimulated interest
in wavelet applications to statistics and to the analysis of experimental data,
with many successes in the efficient analysis, processing, and compression of
noisy signals and images. This is a selective review article that attempts to
synthesize some recent work on ``nonlinear'' wavelet methods in nonparametric
curve estimation and their role on a variety of applications. After a short
introduction to wavelet theory, we discuss in detail several wavelet shrinkage
and wavelet thresholding estimators, scattered in the literature and developed,
under more or less standard settings, for density estimation from i.i.d.
observations or to denoise data modeled as observations of a signal with
additive noise. Most of these methods are fitted into the general concept of
regularization with appropriately chosen penalty functions. A narrow range of
applications in major areas of statistics is also discussed such as partial
linear regression models and functional index models. The usefulness of all
these methods are illustrated by means of simulations and practical examples.Comment: Published in at http://dx.doi.org/10.1214/07-SS014 the Statistics
Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian linear inverse problems in regularity scales
We obtain rates of contraction of posterior distributions in inverse problems
defined by scales of smoothness classes. We derive abstract results for general
priors, with contraction rates determined by Galerkin approximation. The rate
depends on the amount of prior concentration near the true function and the
prior mass of functions with inferior Galerkin approximation. We apply the
general result to non-conjugate series priors, showing that these priors give
near optimal and adaptive recovery in some generality, Gaussian priors, and
mixtures of Gaussian priors, where the latter are also shown to be near optimal
and adaptive. The proofs are based on general testing and approximation
arguments, without explicit calculations on the posterior distribution. We are
thus not restricted to priors based on the singular value decomposition of the
operator. We illustrate the results with examples of inverse problems resulting
from differential equations.Comment: 34 page
Adaptive confidence bands for Markov chains and diffusions: Estimating the invariant measure and the drift
As a starting point we prove a functional central limit theorem for
estimators of the invariant measure of a geometrically ergodic Harris-recurrent
Markov chain in a multi-scale space. This allows to construct confidence bands
for the invariant density with optimal (up to undersmoothing)
-diameter by using wavelet projection estimators. In addition our
setting applies to the drift estimation of diffusions observed discretely with
fixed observation distance. We prove a functional central limit theorem for
estimators of the drift function and finally construct adaptive confidence
bands for the drift by using a completely data-driven estimator.Comment: to appear in ESAIM: Probability and Statistic
- …