Estimation of the density of regression errors is a fundamental issue in
regression analysis and it is typically explored via a parametric approach.
This article uses a nonparametric approach with the mean integrated squared
error (MISE) criterion. It solves a long-standing problem, formulated two
decades ago by Mark Pinsker, about estimation of a nonparametric error density
in a nonparametric regression setting with the accuracy of an oracle that knows
the underlying regression errors. The solution implies that, under a mild
assumption on the differentiability of the design density and regression
function, the MISE of a data-driven error density estimator attains minimax
rates and sharp constants known for the case of directly observed regression
errors. The result holds for error densities with finite and infinite supports.
Some extensions of this result for more general heteroscedastic models with
possibly dependent errors and predictors are also obtained; in the latter case
the marginal error density is estimated. In all considered cases a
blockwise-shrinking Efromovich--Pinsker density estimate, based on plugged-in
residuals, is used. The obtained results imply a theoretical justification of a
customary practice in applied regression analysis to consider residuals as
proxies for underlying regression errors. Numerical and real examples are
presented and discussed, and the S-PLUS software is available.Comment: Published at http://dx.doi.org/10.1214/009053605000000435 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org