796 research outputs found

    Bandwidth choice for nonparametric classification

    Full text link
    It is shown that, for kernel-based classification with univariate distributions and two populations, optimal bandwidth choice has a dichotomous character. If the two densities cross at just one point, where their curvatures have the same signs, then minimum Bayes risk is achieved using bandwidths which are an order of magnitude larger than those which minimize pointwise estimation error. On the other hand, if the curvature signs are different, or if there are multiple crossing points, then bandwidths of conventional size are generally appropriate. The range of different modes of behavior is narrower in multivariate settings. There, the optimal size of bandwidth is generally the same as that which is appropriate for pointwise density estimation. These properties motivate empirical rules for bandwidth choice.Comment: Published at http://dx.doi.org/10.1214/009053604000000959 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bandwidth choice for nonparametric classification

    Get PDF
    It is shown that, for kernel-based classification with univariate distributions and two populations, optimal bandwidth choice has a dichotomous character. If the two densities cross at just one point, where their curvatures have the same signs, then minimum Bayes risk is achieved using bandwidths which are an order of magnitude larger than those which minimize pointwise estimation error. On the other hand, if the curvature signs are different, or if there are multiple crossing points, then bandwidths of conventional size are generally appropriate. The range of different modes of behavior is narrower in multivariate settings. There, the optimal size of bandwidth is generally the same as that which is appropriate for pointwise density estimation. These properties motivate empirical rules for bandwidth choice

    Bias in nearest-neighbor hazard estimation

    Get PDF
    In nonparametric curve estimation, the smoothing parameter is critical for performance. In order to estimate the hazard rate, we compare nearest neighbor selectors that minimize the quadratic, the Kullback-Leibler, and the uniform loss. These measures result in a rule of thumb, a cross-validation, and a plug-in selector. A Monte Carlo simulation within the three-parameter exponentiated Weibull distribution indicates that a counter-factual normal distribution, as an input to the selector, does provide a good rule of thumb. If bias is the main concern, minimizing the uniform loss yields the best results, but at the cost of very high variability. Cross-validation has a similar bias to the rule of thumb, but also with high variability. --hazard rate,kernel smoothing,bandwidth selection,nearest neighbor bandwidth,rule of thumb,plug-in,cross-validation,credit risk

    Functional limit laws for the increments of the quantile process; with applications

    Full text link
    We establish a functional limit law of the logarithm for the increments of the normed quantile process based upon a random sample of size n→∞n\to\infty. We extend a limit law obtained by Deheuvels and Mason (12), showing that their results hold uniformly over the bandwidth hh, restricted to vary in [hn′,hn′′][h'_n,h''_n], where {hn′}n≥1\{h'_n\}_{n\geq1} and {hn′′}n≥1\{h''_n\}_{n\geq 1} are appropriate non-random sequences. We treat the case where the sample observations follow possibly non-uniform distributions. As a consequence of our theorems, we provide uniform limit laws for nearest-neighbor density estimators, in the spirit of those given by Deheuvels and Mason (13) for kernel-type estimators.Comment: Published in at http://dx.doi.org/10.1214/07-EJS099 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Population Synthesis via k-Nearest Neighbor Crossover Kernel

    Full text link
    The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the huge degree of freedom which is required to model high-dimensional nonlinearly correlated datasets: the crossover kernel, the k-nearest neighbor restriction of the kernel construction set and the bagging of kernels. The performance as a statistical estimator is examined through real and synthetic datasets. We provide an "optimization-free" parameter selection rule for our method, a theory of how our method works and a computational cost analysis. To demonstrate the usefulness as a population synthesizer, our method is applied to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining (ICDM) 201

    Regression Discontinuity Designs Using Covariates

    Full text link
    We study regression discontinuity designs when covariates are included in the estimation. We examine local polynomial estimators that include discrete or continuous covariates in an additive separable way, but without imposing any parametric restrictions on the underlying population regression functions. We recommend a covariate-adjustment approach that retains consistency under intuitive conditions, and characterize the potential for estimation and inference improvements. We also present new covariate-adjusted mean squared error expansions and robust bias-corrected inference procedures, with heteroskedasticity-consistent and cluster-robust standard errors. An empirical illustration and an extensive simulation study is presented. All methods are implemented in \texttt{R} and \texttt{Stata} software packages
    • …
    corecore