3,269 research outputs found
Spatial adaptation in heteroscedastic regression: Propagation approach
The paper concerns the problem of pointwise adaptive estimation in regression
when the noise is heteroscedastic and incorrectly known. The use of the local
approximation method, which includes the local polynomial smoothing as a
particular case, leads to a finite family of estimators corresponding to
different degrees of smoothing. Data-driven choice of localization degree in
this case can be understood as the problem of selection from this family. This
task can be performed by a suggested in Katkovnik and Spokoiny (2008) FLL
technique based on Lepski's method. An important issue with this type of
procedures - the choice of certain tuning parameters - was addressed in
Spokoiny and Vial (2009). The authors called their approach to the parameter
calibration "propagation". In the present paper the propagation approach is
developed and justified for the heteroscedastic case in presence of the noise
misspecification. Our analysis shows that the adaptive procedure allows a
misspecification of the covariance matrix with a relative error of order
1/log(n), where n is the sample size.Comment: 47 pages. This is the final version of the paper published in at
http://dx.doi.org/10.1214/08-EJS180 the Electronic Journal of Statistics
(http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Deconvolution Estimation in Measurement Error Models: The R Package decon
Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors in variables are two important topics in measurement error models. In this paper, we present a new software package decon for R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples.
Pivotal estimation via square-root Lasso in nonparametric regression
We propose a self-tuning method that simultaneously
resolves three important practical problems in high-dimensional regression
analysis, namely it handles the unknown scale, heteroscedasticity and (drastic)
non-Gaussianity of the noise. In addition, our analysis allows for badly
behaved designs, for example, perfectly collinear regressors, and generates
sharp bounds even in extreme cases, such as the infinite variance case and the
noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds
for including prediction norm rate and sparsity. Our
analysis is based on new impact factors that are tailored for bounding
prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely
on moderate deviation theory for self-normalized sums to achieve Gaussian-like
results under weak conditions. Moreover, we derive bounds on the performance of
ordinary least square (ols) applied to the model selected by accounting for possible misspecification of the selected model. Under
mild conditions, the rate of convergence of ols post
is as good as 's rate. As an application, we consider
the use of and ols post as
estimators of nuisance parameters in a generic semiparametric problem
(nonlinear moment condition or -problem), resulting in a construction of
-consistent and asymptotically normal estimators of the main
parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1204 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Semiparametric Estimation of Heteroscedastic Binary Sample Selection Model
Binary choice sample selection models are widely used in applied economics with large cross-sectional data where heteroscedaticity is typically a serious concern. Existing parametric and semiparametric estimators for the binary selection equation and the outcome equation in such models suffer from serious drawbacks in the presence of heteroscedasticity of unknown form in the latent errors. In this paper we propose some new estimators to overcome these drawbacks under a symmetry condition, robust to both nonnormality and general heterscedasticity. The estimators are shown to be -consistent and asymptotically normal. We also indicate that our approaches may be extended to other important models.
Partially linear additive quantile regression in ultra-high dimension
We consider a flexible semiparametric quantile regression model for analyzing
high dimensional heterogeneous data. This model has several appealing features:
(1) By considering different conditional quantiles, we may obtain a more
complete picture of the conditional distribution of a response variable given
high dimensional covariates. (2) The sparsity level is allowed to be different
at different quantile levels. (3) The partially linear additive structure
accommodates nonlinearity and circumvents the curse of dimensionality. (4) It
is naturally robust to heavy-tailed distributions. In this paper, we
approximate the nonlinear components using B-spline basis functions. We first
study estimation under this model when the nonzero components are known in
advance and the number of covariates in the linear part diverges. We then
investigate a nonconvex penalized estimator for simultaneous variable selection
and estimation. We derive its oracle property for a general class of nonconvex
penalty functions in the presence of ultra-high dimensional covariates under
relaxed conditions. To tackle the challenges of nonsmooth loss function,
nonconvex penalty function and the presence of nonlinear components, we combine
a recently developed convex-differencing method with modern empirical process
techniques. Monte Carlo simulations and an application to a microarray study
demonstrate the effectiveness of the proposed method. We also discuss how the
method for a single quantile of interest can be extended to simultaneous
variable selection and estimation at multiple quantiles.Comment: Published at http://dx.doi.org/10.1214/15-AOS1367 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases
We investigate the optimality for model selection of the so-called slope
heuristics, -fold cross-validation and -fold penalization in a
heteroscedastic with random design regression context. We consider a new class
of linear models that we call strongly localized bases and that generalize
histograms, piecewise polynomials and compactly supported wavelets. We derive
sharp oracle inequalities that prove the asymptotic optimality of the slope
heuristics---when the optimal penalty shape is known---and -fold
penalization. Furthermore, -fold cross-validation seems to be suboptimal for
a fixed value of since it recovers asymptotically the oracle learned from a
sample size equal to of the original amount of data. Our results are
based on genuine concentration inequalities for the true and empirical excess
risks that are of independent interest. We show in our experiments the good
behavior of the slope heuristics for the selection of linear wavelet models.
Furthermore, -fold cross-validation and -fold penalization have
comparable efficiency
Local Linear Multivariate Regression with Variable Bandwidth in the Presence of Heteroscedasticity
We present a local linear estimator with variable bandwidth for multivariate nonparametric regression. We prove its consistency and asymptotic normality in the interior of the observed data and obtain its rates of convergence. This result is used to obtain practical direct plug-in bandwidth selectors for heteroscedastic regression in one and two dimensions. We show that the local linear estimator with variable bandwidth has better goodness-of-fit properties than the local linear estimator with constant bandwidth, in the presence of heteroscedasticity.Heteroscedasticity; kernel smoothing; local linear regression; plug-in bandwidth, variable bandwidth.
Quantile Regression in the Presence of Sample Selection
Most sample selection models assume that the errors are independent of the regressors. Under this assumption, all quantile and mean functions are parallel, which implies that quantile estimators cannot reveal any (per definition non-existing) heterogeneity. However, quantile estimators are useful for testing the independence assumption, because they are consistent under the null hypothesis. We propose tests for this crucial restriction that are based on the entire conditional quantile regression process after correcting for sample selection bias. Monte Carlo simulations demonstrate that they are powerful and two empirical illustrations indicate that violations of this assumption are likely to be ubiquitous in labor economics.Sample selection, quantile regression, independence, test
- …