8,390 research outputs found
Kernel density classification and boosting: an L2 sub analysis
Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is âboostingâ, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research
On boosting kernel regression
In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L-2 boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast. and the variance diverges exponentially slow. The first boosting step is analysed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data
Demystifying Fixed k-Nearest Neighbor Information Estimators
Estimating mutual information from i.i.d. samples drawn from an unknown joint
density function is a basic statistical problem of broad interest with
multitudinous applications. The most popular estimator is one proposed by
Kraskov and St\"ogbauer and Grassberger (KSG) in 2004, and is nonparametric and
based on the distances of each sample to its nearest neighboring
sample, where is a fixed small integer. Despite its widespread use (part of
scientific software packages), theoretical properties of this estimator have
been largely unexplored. In this paper we demonstrate that the estimator is
consistent and also identify an upper bound on the rate of convergence of the
bias as a function of number of samples. We argue that the superior performance
benefits of the KSG estimator stems from a curious "correlation boosting"
effect and build on this intuition to modify the KSG estimator in novel ways to
construct a superior estimator. As a byproduct of our investigations, we obtain
nearly tight rates of convergence of the error of the well known fixed
nearest neighbor estimator of differential entropy by Kozachenko and
Leonenko.Comment: 55 pages, 8 figure
Probability density estimation with tunable kernels using orthogonal forward regression
A generalized or tunable-kernel model is proposed for probability density function estimation based on an orthogonal forward regression procedure. Each stage of the density estimation process determines a tunable kernel, namely, its center vector and diagonal covariance matrix, by minimizing a leave-one-out test criterion. The kernel mixing weights of the constructed sparse density estimate are finally updated using the multiplicative nonnegative quadratic programming algorithm to ensure the nonnegative and unity constraints, and this weight-updating process additionally has the desired ability to further reduce the model size. The proposed tunable-kernel model has advantages, in terms of model generalization capability and model sparsity, over the standard fixed-kernel model that restricts kernel centers to the training data points and employs a single common kernel variance for every kernel. On the other hand, it does not optimize all the model parameters together and thus avoids the problems of high-dimensional ill-conditioned nonlinear optimization associated with the conventional finite mixture model. Several examples are included to demonstrate the ability of the proposed novel tunable-kernel model to effectively construct a very compact density estimate accurately
Kullback-Leibler aggregation and misspecified generalized linear models
In a regression setup with deterministic design, we study the pure
aggregation problem and introduce a natural extension from the Gaussian
distribution to distributions in the exponential family. While this extension
bears strong connections with generalized linear models, it does not require
identifiability of the parameter or even that the model on the systematic
component is true. It is shown that this problem can be solved by constrained
and/or penalized likelihood maximization and we derive sharp oracle
inequalities that hold both in expectation and with high probability. Finally
all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Selective machine learning of doubly robust functionals
While model selection is a well-studied topic in parametric and nonparametric
regression or density estimation, selection of possibly high-dimensional
nuisance parameters in semiparametric problems is far less developed. In this
paper, we propose a selective machine learning framework for making inferences
about a finite-dimensional functional defined on a semiparametric model, when
the latter admits a doubly robust estimating function and several candidate
machine learning algorithms are available for estimating the nuisance
parameters. We introduce two new selection criteria for bias reduction in
estimating the functional of interest, each based on a novel definition of
pseudo-risk for the functional that embodies the double robustness property and
thus is used to select the pair of learners that is nearest to fulfilling this
property. We establish an oracle property for a multi-fold cross-validation
version of the new selection criteria which states that our empirical criteria
perform nearly as well as an oracle with a priori knowledge of the pseudo-risk
for each pair of candidate learners. We also describe a smooth approximation to
the selection criteria which allows for valid post-selection inference.
Finally, we apply the approach to model selection of a semiparametric estimator
of average treatment effect given an ensemble of candidate machine learners to
account for confounding in an observational study
Boosted Beta regression.
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures
Fast learning rates for plug-in classifiers
It has been recently shown that, under the margin (or low noise) assumption,
there exist classifiers attaining fast rates of convergence of the excess Bayes
risk, that is, rates faster than . The work on this subject has
suggested the following two conjectures: (i) the best achievable fast rate is
of the order , and (ii) the plug-in classifiers generally converge more
slowly than the classifiers based on empirical risk minimization. We show that
both conjectures are not correct. In particular, we construct plug-in
classifiers that can achieve not only fast, but also super-fast rates, that is,
rates faster than . We establish minimax lower bounds showing that the
obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- âŠ