3,269 research outputs found

    Spatial adaptation in heteroscedastic regression: Propagation approach

    Full text link
    The paper concerns the problem of pointwise adaptive estimation in regression when the noise is heteroscedastic and incorrectly known. The use of the local approximation method, which includes the local polynomial smoothing as a particular case, leads to a finite family of estimators corresponding to different degrees of smoothing. Data-driven choice of localization degree in this case can be understood as the problem of selection from this family. This task can be performed by a suggested in Katkovnik and Spokoiny (2008) FLL technique based on Lepski's method. An important issue with this type of procedures - the choice of certain tuning parameters - was addressed in Spokoiny and Vial (2009). The authors called their approach to the parameter calibration "propagation". In the present paper the propagation approach is developed and justified for the heteroscedastic case in presence of the noise misspecification. Our analysis shows that the adaptive procedure allows a misspecification of the covariance matrix with a relative error of order 1/log(n), where n is the sample size.Comment: 47 pages. This is the final version of the paper published in at http://dx.doi.org/10.1214/08-EJS180 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Deconvolution Estimation in Measurement Error Models: The R Package decon

    Get PDF
    Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors in variables are two important topics in measurement error models. In this paper, we present a new software package decon for R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples.

    Pivotal estimation via square-root Lasso in nonparametric regression

    Get PDF
    We propose a self-tuning Lasso\sqrt{\mathrm {Lasso}} method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for Lasso\sqrt{\mathrm {Lasso}} including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by Lasso\sqrt{\mathrm {Lasso}} accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post Lasso\sqrt{\mathrm {Lasso}} is as good as Lasso\sqrt{\mathrm {Lasso}}'s rate. As an application, we consider the use of Lasso\sqrt{\mathrm {Lasso}} and ols post Lasso\sqrt{\mathrm {Lasso}} as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or ZZ-problem), resulting in a construction of n\sqrt{n}-consistent and asymptotically normal estimators of the main parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1204 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Semiparametric Estimation of Heteroscedastic Binary Sample Selection Model

    Get PDF
    Binary choice sample selection models are widely used in applied economics with large cross-sectional data where heteroscedaticity is typically a serious concern. Existing parametric and semiparametric estimators for the binary selection equation and the outcome equation in such models suffer from serious drawbacks in the presence of heteroscedasticity of unknown form in the latent errors. In this paper we propose some new estimators to overcome these drawbacks under a symmetry condition, robust to both nonnormality and general heterscedasticity. The estimators are shown to be n\sqrt{n}-consistent and asymptotically normal. We also indicate that our approaches may be extended to other important models.

    Partially linear additive quantile regression in ultra-high dimension

    Get PDF
    We consider a flexible semiparametric quantile regression model for analyzing high dimensional heterogeneous data. This model has several appealing features: (1) By considering different conditional quantiles, we may obtain a more complete picture of the conditional distribution of a response variable given high dimensional covariates. (2) The sparsity level is allowed to be different at different quantile levels. (3) The partially linear additive structure accommodates nonlinearity and circumvents the curse of dimensionality. (4) It is naturally robust to heavy-tailed distributions. In this paper, we approximate the nonlinear components using B-spline basis functions. We first study estimation under this model when the nonzero components are known in advance and the number of covariates in the linear part diverges. We then investigate a nonconvex penalized estimator for simultaneous variable selection and estimation. We derive its oracle property for a general class of nonconvex penalty functions in the presence of ultra-high dimensional covariates under relaxed conditions. To tackle the challenges of nonsmooth loss function, nonconvex penalty function and the presence of nonlinear components, we combine a recently developed convex-differencing method with modern empirical process techniques. Monte Carlo simulations and an application to a microarray study demonstrate the effectiveness of the proposed method. We also discuss how the method for a single quantile of interest can be extended to simultaneous variable selection and estimation at multiple quantiles.Comment: Published at http://dx.doi.org/10.1214/15-AOS1367 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases

    Get PDF
    We investigate the optimality for model selection of the so-called slope heuristics, VV-fold cross-validation and VV-fold penalization in a heteroscedastic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities that prove the asymptotic optimality of the slope heuristics---when the optimal penalty shape is known---and VV -fold penalization. Furthermore, VV-fold cross-validation seems to be suboptimal for a fixed value of VV since it recovers asymptotically the oracle learned from a sample size equal to 1V11-V^{-1} of the original amount of data. Our results are based on genuine concentration inequalities for the true and empirical excess risks that are of independent interest. We show in our experiments the good behavior of the slope heuristics for the selection of linear wavelet models. Furthermore, VV-fold cross-validation and VV-fold penalization have comparable efficiency

    Local Linear Multivariate Regression with Variable Bandwidth in the Presence of Heteroscedasticity

    Get PDF
    We present a local linear estimator with variable bandwidth for multivariate nonparametric regression. We prove its consistency and asymptotic normality in the interior of the observed data and obtain its rates of convergence. This result is used to obtain practical direct plug-in bandwidth selectors for heteroscedastic regression in one and two dimensions. We show that the local linear estimator with variable bandwidth has better goodness-of-fit properties than the local linear estimator with constant bandwidth, in the presence of heteroscedasticity.Heteroscedasticity; kernel smoothing; local linear regression; plug-in bandwidth, variable bandwidth.

    Quantile Regression in the Presence of Sample Selection

    Get PDF
    Most sample selection models assume that the errors are independent of the regressors. Under this assumption, all quantile and mean functions are parallel, which implies that quantile estimators cannot reveal any (per definition non-existing) heterogeneity. However, quantile estimators are useful for testing the independence assumption, because they are consistent under the null hypothesis. We propose tests for this crucial restriction that are based on the entire conditional quantile regression process after correcting for sample selection bias. Monte Carlo simulations demonstrate that they are powerful and two empirical illustrations indicate that violations of this assumption are likely to be ubiquitous in labor economics.Sample selection, quantile regression, independence, test
    corecore