50 research outputs found
Nonparametric regression with homogeneous group testing data
We introduce new nonparametric predictors for homogeneous pooled data in the
context of group testing for rare abnormalities and show that they achieve
optimal rates of convergence. In particular, when the level of pooling is
moderate, then despite the cost savings, the method enjoys the same convergence
rate as in the case of no pooling. In the setting of "over-pooling" the
convergence rate differs from that of an optimal estimator by no more than a
logarithmic factor. Our approach improves on the random-pooling nonparametric
predictor, which is currently the only nonparametric method available, unless
there is no pooling, in which case the two approaches are identical.Comment: Published in at http://dx.doi.org/10.1214/11-AOS952 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Methodology and theory for partial least squares applied to functional data
The partial least squares procedure was originally developed to estimate the
slope parameter in multivariate parametric models. More recently it has gained
popularity in the functional data literature. There, the partial least squares
estimator of slope is either used to construct linear predictive models, or as
a tool to project the data onto a one-dimensional quantity that is employed for
further statistical analysis. Although the partial least squares approach is
often viewed as an attractive alternative to projections onto the principal
component basis, its properties are less well known than those of the latter,
mainly because of its iterative nature. We develop an explicit formulation of
partial least squares for functional data, which leads to insightful results
and motivates new theory, demonstrating consistency and establishing
convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Defining probability density for a distribution of random functions
The notion of probability density for a random function is not as
straightforward as in finite-dimensional cases. While a probability density
function generally does not exist for functional data, we show that it is
possible to develop the notion of density when functional data are considered
in the space determined by the eigenfunctions of principal component analysis.
This leads to a transparent and meaningful surrogate for density defined in
terms of the average value of the logarithms of the densities of the
distributions of principal components for a given dimension. This density
approximation is estimable readily from data. It accurately represents, in a
monotone way, key features of small-ball approximations to density. Our results
on estimators of the densities of principal component scores are also of
independent interest; they reveal interesting shape differences that have not
previously been considered. The statistical implications of these results and
properties are identified and discussed, and practical ramifications are
illustrated in numerical work.Comment: Published in at http://dx.doi.org/10.1214/09-AOS741 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier
The data functions that are studied in the course of functional data analysis
are assembled from discrete data, and the level of smoothing that is used is
generally that which is appropriate for accurate approximation of the
conceptually smooth functions that were not actually observed. Existing
literature shows that this approach is effective, and even optimal, when using
functional data methods for prediction or hypothesis testing. However, in the
present paper we show that this approach is not effective in classification
problems. There a useful rule of thumb is that undersmoothing is often
desirable, but there are several surprising qualifications to that approach.
First, the effect of smoothing the training data can be more significant than
that of smoothing the new data set to be classified; second, undersmoothing is
not always the right approach, and in fact in some cases using a relatively
large bandwidth can be more effective; and third, these perverse results are
the consequence of very unusual properties of error rates, expressed as
functions of smoothing parameters. For example, the orders of magnitude of
optimal smoothing parameter choices depend on the signs and sizes of terms in
an expansion of error rate, and those signs and sizes can vary dramatically
from one setting to another, even for the same classifier.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1158 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Nonparametric covariate-adjusted regression
We consider nonparametric estimation of a regression curve when the data are
observed with multiplicative distortion which depends on an observed
confounding variable. We suggest several estimators, ranging from a relatively
simple one that relies on restrictive assumptions usually made in the
literature, to a sophisticated piecewise approach that involves reconstructing
a smooth curve from an estimator of a constant multiple of its absolute value,
and which can be applied in much more general scenarios. We show that, although
our nonparametric estimators are constructed from predictors of the unobserved
undistorted data, they have the same first order asymptotic properties as the
standard estimators that could be computed if the undistorted data were
available. We illustrate the good numerical performance of our methods on both
simulated and real datasets.Comment: 32 pages, 4 figure
Density estimation with heteroscedastic error
It is common, in deconvolution problems, to assume that the measurement
errors are identically distributed. In many real-life applications, however,
this condition is not satisfied and the deconvolution estimators developed for
homoscedastic errors become inconsistent. In this paper, we introduce a kernel
estimator of a density in the case of heteroscedastic contamination. We
establish consistency of the estimator and show that it achieves optimal rates
of convergence under quite general conditions. We study the limits of
application of the procedure in some extreme situations, where we show that, in
some cases, our estimator is consistent, even when the scaling parameter of the
error is unbounded. We suggest a modified estimator for the problem where the
distribution of the errors is unknown, but replicated observations are
available. Finally, an adaptive procedure for selecting the smoothing parameter
is proposed and its finite-sample properties are investigated on simulated
examples.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ121 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
On deconvolution with repeated measurements
In a large class of statistical inverse problems it is necessary to suppose
that the transformation that is inverted is known. Although, in many
applications, it is unrealistic to make this assumption, the problem is often
insoluble without it. However, if additional data are available, then it is
possible to estimate consistently the unknown error density. Data are seldom
available directly on the transformation, but repeated, or replicated,
measurements increasingly are becoming available. Such data consist of
``intrinsic'' values that are measured several times, with errors that are
generally independent. Working in this setting we treat the nonparametric
deconvolution problems of density estimation with observation errors, and
regression with errors in variables. We show that, even if the number of
repeated measurements is quite small, it is possible for modified kernel
estimators to achieve the same level of performance they would if the error
distribution were known. Indeed, density and regression estimators can be
constructed from replicated data so that they have the same first-order
properties as conventional estimators in the known-error case, without any
replication, but with sample size equal to the sum of the numbers of
replicates. Practical methods for constructing estimators with these properties
are suggested, involving empirical rules for smoothing-parameter choice.Comment: Published in at http://dx.doi.org/10.1214/009053607000000884 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org