796 research outputs found
Bandwidth choice for nonparametric classification
It is shown that, for kernel-based classification with univariate
distributions and two populations, optimal bandwidth choice has a dichotomous
character. If the two densities cross at just one point, where their curvatures
have the same signs, then minimum Bayes risk is achieved using bandwidths which
are an order of magnitude larger than those which minimize pointwise estimation
error. On the other hand, if the curvature signs are different, or if there are
multiple crossing points, then bandwidths of conventional size are generally
appropriate. The range of different modes of behavior is narrower in
multivariate settings. There, the optimal size of bandwidth is generally the
same as that which is appropriate for pointwise density estimation. These
properties motivate empirical rules for bandwidth choice.Comment: Published at http://dx.doi.org/10.1214/009053604000000959 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bandwidth choice for nonparametric classification
It is shown that, for kernel-based classification with univariate
distributions and two populations, optimal bandwidth choice has a dichotomous
character. If the two densities cross at just one point, where their curvatures
have the same signs, then minimum Bayes risk is achieved using bandwidths which
are an order of magnitude larger than those which minimize pointwise estimation
error. On the other hand, if the curvature signs are different, or if there are
multiple crossing points, then bandwidths of conventional size are generally
appropriate. The range of different modes of behavior is narrower in
multivariate settings. There, the optimal size of bandwidth is generally the
same as that which is appropriate for pointwise density estimation. These
properties motivate empirical rules for bandwidth choice
Bias in nearest-neighbor hazard estimation
In nonparametric curve estimation, the smoothing parameter is critical for performance. In order to estimate the hazard rate, we compare nearest neighbor selectors that minimize the quadratic, the Kullback-Leibler, and the uniform loss. These measures result in a rule of thumb, a cross-validation, and a plug-in selector. A Monte Carlo simulation within the three-parameter exponentiated Weibull distribution indicates that a counter-factual normal distribution, as an input to the selector, does provide a good rule of thumb. If bias is the main concern, minimizing the uniform loss yields the best results, but at the cost of very high variability. Cross-validation has a similar bias to the rule of thumb, but also with high variability. --hazard rate,kernel smoothing,bandwidth selection,nearest neighbor bandwidth,rule of thumb,plug-in,cross-validation,credit risk
Functional limit laws for the increments of the quantile process; with applications
We establish a functional limit law of the logarithm for the increments of
the normed quantile process based upon a random sample of size . We
extend a limit law obtained by Deheuvels and Mason (12), showing that their
results hold uniformly over the bandwidth , restricted to vary in
, where and are
appropriate non-random sequences. We treat the case where the sample
observations follow possibly non-uniform distributions. As a consequence of our
theorems, we provide uniform limit laws for nearest-neighbor density
estimators, in the spirit of those given by Deheuvels and Mason (13) for
kernel-type estimators.Comment: Published in at http://dx.doi.org/10.1214/07-EJS099 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Population Synthesis via k-Nearest Neighbor Crossover Kernel
The recent development of multi-agent simulations brings about a need for
population synthesis. It is a task of reconstructing the entire population from
a sampling survey of limited size (1% or so), supplying the initial conditions
from which simulations begin. This paper presents a new kernel density
estimator for this task. Our method is an analogue of the classical
Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the
huge degree of freedom which is required to model high-dimensional nonlinearly
correlated datasets: the crossover kernel, the k-nearest neighbor restriction
of the kernel construction set and the bagging of kernels. The performance as a
statistical estimator is examined through real and synthetic datasets. We
provide an "optimization-free" parameter selection rule for our method, a
theory of how our method works and a computational cost analysis. To
demonstrate the usefulness as a population synthesizer, our method is applied
to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining
(ICDM) 201
Regression Discontinuity Designs Using Covariates
We study regression discontinuity designs when covariates are included in the
estimation. We examine local polynomial estimators that include discrete or
continuous covariates in an additive separable way, but without imposing any
parametric restrictions on the underlying population regression functions. We
recommend a covariate-adjustment approach that retains consistency under
intuitive conditions, and characterize the potential for estimation and
inference improvements. We also present new covariate-adjusted mean squared
error expansions and robust bias-corrected inference procedures, with
heteroskedasticity-consistent and cluster-robust standard errors. An empirical
illustration and an extensive simulation study is presented. All methods are
implemented in \texttt{R} and \texttt{Stata} software packages
- …