50 research outputs found

    Nonparametric regression with homogeneous group testing data

    Full text link
    We introduce new nonparametric predictors for homogeneous pooled data in the context of group testing for rare abnormalities and show that they achieve optimal rates of convergence. In particular, when the level of pooling is moderate, then despite the cost savings, the method enjoys the same convergence rate as in the case of no pooling. In the setting of "over-pooling" the convergence rate differs from that of an optimal estimator by no more than a logarithmic factor. Our approach improves on the random-pooling nonparametric predictor, which is currently the only nonparametric method available, unless there is no pooling, in which case the two approaches are identical.Comment: Published in at http://dx.doi.org/10.1214/11-AOS952 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Methodology and theory for partial least squares applied to functional data

    Full text link
    The partial least squares procedure was originally developed to estimate the slope parameter in multivariate parametric models. More recently it has gained popularity in the functional data literature. There, the partial least squares estimator of slope is either used to construct linear predictive models, or as a tool to project the data onto a one-dimensional quantity that is employed for further statistical analysis. Although the partial least squares approach is often viewed as an attractive alternative to projections onto the principal component basis, its properties are less well known than those of the latter, mainly because of its iterative nature. We develop an explicit formulation of partial least squares for functional data, which leads to insightful results and motivates new theory, demonstrating consistency and establishing convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Defining probability density for a distribution of random functions

    Full text link
    The notion of probability density for a random function is not as straightforward as in finite-dimensional cases. While a probability density function generally does not exist for functional data, we show that it is possible to develop the notion of density when functional data are considered in the space determined by the eigenfunctions of principal component analysis. This leads to a transparent and meaningful surrogate for density defined in terms of the average value of the logarithms of the densities of the distributions of principal components for a given dimension. This density approximation is estimable readily from data. It accurately represents, in a monotone way, key features of small-ball approximations to density. Our results on estimators of the densities of principal component scores are also of independent interest; they reveal interesting shape differences that have not previously been considered. The statistical implications of these results and properties are identified and discussed, and practical ramifications are illustrated in numerical work.Comment: Published in at http://dx.doi.org/10.1214/09-AOS741 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier

    Get PDF
    The data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually smooth functions that were not actually observed. Existing literature shows that this approach is effective, and even optimal, when using functional data methods for prediction or hypothesis testing. However, in the present paper we show that this approach is not effective in classification problems. There a useful rule of thumb is that undersmoothing is often desirable, but there are several surprising qualifications to that approach. First, the effect of smoothing the training data can be more significant than that of smoothing the new data set to be classified; second, undersmoothing is not always the right approach, and in fact in some cases using a relatively large bandwidth can be more effective; and third, these perverse results are the consequence of very unusual properties of error rates, expressed as functions of smoothing parameters. For example, the orders of magnitude of optimal smoothing parameter choices depend on the signs and sizes of terms in an expansion of error rate, and those signs and sizes can vary dramatically from one setting to another, even for the same classifier.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1158 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric covariate-adjusted regression

    Full text link
    We consider nonparametric estimation of a regression curve when the data are observed with multiplicative distortion which depends on an observed confounding variable. We suggest several estimators, ranging from a relatively simple one that relies on restrictive assumptions usually made in the literature, to a sophisticated piecewise approach that involves reconstructing a smooth curve from an estimator of a constant multiple of its absolute value, and which can be applied in much more general scenarios. We show that, although our nonparametric estimators are constructed from predictors of the unobserved undistorted data, they have the same first order asymptotic properties as the standard estimators that could be computed if the undistorted data were available. We illustrate the good numerical performance of our methods on both simulated and real datasets.Comment: 32 pages, 4 figure

    Density estimation with heteroscedastic error

    Full text link
    It is common, in deconvolution problems, to assume that the measurement errors are identically distributed. In many real-life applications, however, this condition is not satisfied and the deconvolution estimators developed for homoscedastic errors become inconsistent. In this paper, we introduce a kernel estimator of a density in the case of heteroscedastic contamination. We establish consistency of the estimator and show that it achieves optimal rates of convergence under quite general conditions. We study the limits of application of the procedure in some extreme situations, where we show that, in some cases, our estimator is consistent, even when the scaling parameter of the error is unbounded. We suggest a modified estimator for the problem where the distribution of the errors is unknown, but replicated observations are available. Finally, an adaptive procedure for selecting the smoothing parameter is proposed and its finite-sample properties are investigated on simulated examples.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ121 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    On deconvolution with repeated measurements

    Full text link
    In a large class of statistical inverse problems it is necessary to suppose that the transformation that is inverted is known. Although, in many applications, it is unrealistic to make this assumption, the problem is often insoluble without it. However, if additional data are available, then it is possible to estimate consistently the unknown error density. Data are seldom available directly on the transformation, but repeated, or replicated, measurements increasingly are becoming available. Such data consist of ``intrinsic'' values that are measured several times, with errors that are generally independent. Working in this setting we treat the nonparametric deconvolution problems of density estimation with observation errors, and regression with errors in variables. We show that, even if the number of repeated measurements is quite small, it is possible for modified kernel estimators to achieve the same level of performance they would if the error distribution were known. Indeed, density and regression estimators can be constructed from replicated data so that they have the same first-order properties as conventional estimators in the known-error case, without any replication, but with sample size equal to the sum of the numbers of replicates. Practical methods for constructing estimators with these properties are suggested, involving empirical rules for smoothing-parameter choice.Comment: Published in at http://dx.doi.org/10.1214/009053607000000884 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore