38,117 research outputs found
Bandwidth Selection for Level Set Estimation in the Context of Regression and a Simulation Study for Non Parametric Level Set Estimation When the Density Is Log-Concave
Bandwidth selection is critical for kernel estimation because it controls the amount of smoothing for a function's estimator. Traditional methods for bandwidth selection involve optimizing a global loss function (e.g. least squares cross validation, asymptotic mean integrated squared error). Nevertheless, a global loss function becomes suboptimal for the level set estimation problem which is local in nature. For a function , the level set is the set LSλ = {x : g(x) ≥ λ}.
In the first part of this thesis we study optimal bandwidth selection for the Nadaraya-Watson kernel estimator in one dimension. We present a local loss function as an alternative to metric and derive an asymptotic approximation of its corresponding risk. The level set optimal bandwidth is the argument that minimizes the asymptotic approximation. We show that the rate of coincides with the rate from traditional global bandwidth selectors. We then derive an algorithm to obtain the practical bandwidth and study its performance through simulations. Our simulation results show that in general, for small samples and small levels, the level set optimal bandwidth shows improvement in estimating the level set when compared to the cross validation bandwidth selection or the local polynomial kernel estimator. We illustrate this new bandwidth selector on a decompression sickness study on the effects of duration and pressure on mortality during a dive.
In the second part, motivated by our simulation findings and the relationship of the level set estimation to the highest density region (HDR) problem, we study via simulations the properties of a plug-in estimator where the density is estimated with a log-concave mixed model. We focus in particular on univariate densities and compare this method against a kernel plug-in estimator. The bandwidth for the kernel plug-in estimator is chosen optimally for the HDR problem. We observe through simulations that when the number of components in the model is correctly specified, the log-concave plug-in estimator performs better than the kernel estimator for lower levels and similarly for the rest of the levels considered. We conclude with an analysis on the daily maximum temperatures in Melbourne, Australia
Probit transformation for kernel density estimation on the unit interval
Kernel estimation of a probability density function supported on the unit
interval has proved difficult, because of the well known boundary bias issues a
conventional kernel density estimator would necessarily face in this situation.
Transforming the variable of interest into a variable whose density has
unconstrained support, estimating that density, and obtaining an estimate of
the density of the original variable through back-transformation, seems a
natural idea to easily get rid of the boundary problems. In practice, however,
a simple and efficient implementation of this methodology is far from
immediate, and the few attempts found in the literature have been reported not
to perform well. In this paper, the main reasons for this failure are
identified and an easy way to correct them is suggested. It turns out that
combining the transformation idea with local likelihood density estimation
produces viable density estimators, mostly free from boundary issues. Their
asymptotic properties are derived, and a practical cross-validation bandwidth
selection rule is devised. Extensive simulations demonstrate the excellent
performance of these estimators compared to their main competitors for a wide
range of density shapes. In fact, they turn out to be the best choice overall.
Finally, they are used to successfully estimate a density of non-standard shape
supported on from a small-size real data sample
Bandwidth selection in kernel empirical risk minimization via the gradient
In this paper, we deal with the data-driven selection of multidimensional and
possibly anisotropic bandwidths in the general framework of kernel empirical
risk minimization. We propose a universal selection rule, which leads to
optimal adaptive results in a large variety of statistical models such as
nonparametric robust regression and statistical learning with errors in
variables. These results are stated in the context of smooth loss functions,
where the gradient of the risk appears as a good criterion to measure the
performance of our estimators. The selection rule consists of a comparison of
gradient empirical risks. It can be viewed as a nontrivial improvement of the
so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one
main advantage of our selection rule is the nondependency on the Hessian matrix
of the risk, usually involved in standard adaptive procedures.Comment: Published at http://dx.doi.org/10.1214/15-AOS1318 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …