38,117 research outputs found

    Bandwidth Selection for Level Set Estimation in the Context of Regression and a Simulation Study for Non Parametric Level Set Estimation When the Density Is Log-Concave

    Get PDF
    Bandwidth selection is critical for kernel estimation because it controls the amount of smoothing for a function's estimator. Traditional methods for bandwidth selection involve optimizing a global loss function (e.g. least squares cross validation, asymptotic mean integrated squared error). Nevertheless, a global loss function becomes suboptimal for the level set estimation problem which is local in nature. For a function gg, the level set is the set LSλ = {x : g(x) ≥ λ}. In the first part of this thesis we study optimal bandwidth selection for the Nadaraya-Watson kernel estimator in one dimension. We present a local loss function as an alternative to L2L_2 metric and derive an asymptotic approximation of its corresponding risk. The level set optimal bandwidth (hopt)(h_{opt}) is the argument that minimizes the asymptotic approximation. We show that the rate of hopth_{opt} coincides with the rate from traditional global bandwidth selectors. We then derive an algorithm to obtain the practical bandwidth and study its performance through simulations. Our simulation results show that in general, for small samples and small levels, the level set optimal bandwidth shows improvement in estimating the level set when compared to the cross validation bandwidth selection or the local polynomial kernel estimator. We illustrate this new bandwidth selector on a decompression sickness study on the effects of duration and pressure on mortality during a dive. In the second part, motivated by our simulation findings and the relationship of the level set estimation to the highest density region (HDR) problem, we study via simulations the properties of a plug-in estimator where the density is estimated with a log-concave mixed model. We focus in particular on univariate densities and compare this method against a kernel plug-in estimator. The bandwidth for the kernel plug-in estimator is chosen optimally for the HDR problem. We observe through simulations that when the number of components in the model is correctly specified, the log-concave plug-in estimator performs better than the kernel estimator for lower levels and similarly for the rest of the levels considered. We conclude with an analysis on the daily maximum temperatures in Melbourne, Australia

    Probit transformation for kernel density estimation on the unit interval

    Full text link
    Kernel estimation of a probability density function supported on the unit interval has proved difficult, because of the well known boundary bias issues a conventional kernel density estimator would necessarily face in this situation. Transforming the variable of interest into a variable whose density has unconstrained support, estimating that density, and obtaining an estimate of the density of the original variable through back-transformation, seems a natural idea to easily get rid of the boundary problems. In practice, however, a simple and efficient implementation of this methodology is far from immediate, and the few attempts found in the literature have been reported not to perform well. In this paper, the main reasons for this failure are identified and an easy way to correct them is suggested. It turns out that combining the transformation idea with local likelihood density estimation produces viable density estimators, mostly free from boundary issues. Their asymptotic properties are derived, and a practical cross-validation bandwidth selection rule is devised. Extensive simulations demonstrate the excellent performance of these estimators compared to their main competitors for a wide range of density shapes. In fact, they turn out to be the best choice overall. Finally, they are used to successfully estimate a density of non-standard shape supported on [0,1][0,1] from a small-size real data sample

    Bandwidth selection in kernel empirical risk minimization via the gradient

    Get PDF
    In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. The selection rule consists of a comparison of gradient empirical risks. It can be viewed as a nontrivial improvement of the so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one main advantage of our selection rule is the nondependency on the Hessian matrix of the risk, usually involved in standard adaptive procedures.Comment: Published at http://dx.doi.org/10.1214/15-AOS1318 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore