Search CORE

8,390 research outputs found

Kernel density classification and boosting: an L2 sub analysis

Author: B.W. Silverman
C. C. Taylor
D. Michie
D.E. Wright
D.J. Hand
G. Ridgeway
G.R. Terrell
I.S. Abramson
J.D.F. Habbema
J.H. Friedman
J.H. Friedman
J.H. Friedman
M. Di Marzio
M. Di Marzio
M.C. Jones
M.C. Jones
M.C. Jones
M.P. Wand
P. Bühlmann
P. Hall
P. Hall
P. Hall
R.E. Shapire
T. Hastie
Y. Freund
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research

CiteSeerX

Crossref

White Rose Research Online

On boosting kernel regression

Author: Breiman
Bühlmann
Bühlmann
Charles C. Taylor
Chaudhuri
Di Marzio
Doksum
Fan
Freund
Friedman
Friedman
Harrison
Hastie
Härdle
Jiang
Jones
Lax
Lugosi
Marco Di Marzio
Müller
Rice
Schapire
Stuetzle
Tukey
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L-2 boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast. and the variance diverges exponentially slow. The first boosting step is analysed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data

CiteSeerX

Crossref

White Rose Research Online

Demystifying Fixed k-Nearest Neighbor Information Estimators

Author: Gao Weihao
Oh Sewoong
Viswanath Pramod
Publication venue
Publication date: 10/08/2016
Field of study

Estimating mutual information from i.i.d. samples drawn from an unknown joint density function is a basic statistical problem of broad interest with multitudinous applications. The most popular estimator is one proposed by Kraskov and St\"ogbauer and Grassberger (KSG) in 2004, and is nonparametric and based on the distances of each sample to its

k^{\rm th}

nearest neighboring sample, where

k

is a fixed small integer. Despite its widespread use (part of scientific software packages), theoretical properties of this estimator have been largely unexplored. In this paper we demonstrate that the estimator is consistent and also identify an upper bound on the rate of convergence of the bias as a function of number of samples. We argue that the superior performance benefits of the KSG estimator stems from a curious "correlation boosting" effect and build on this intuition to modify the KSG estimator in novel ways to construct a superior estimator. As a byproduct of our investigations, we obtain nearly tight rates of convergence of the

\ell_2

error of the well known fixed

k

nearest neighbor estimator of differential entropy by Kozachenko and Leonenko.Comment: 55 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Probability density estimation with tunable kernels using orthogonal forward regression

Author: Chen S.
Harris Chris J.
Hong Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2009
Field of study

A generalized or tunable-kernel model is proposed for probability density function estimation based on an orthogonal forward regression procedure. Each stage of the density estimation process determines a tunable kernel, namely, its center vector and diagonal covariance matrix, by minimizing a leave-one-out test criterion. The kernel mixing weights of the constructed sparse density estimate are finally updated using the multiplicative nonnegative quadratic programming algorithm to ensure the nonnegative and unity constraints, and this weight-updating process additionally has the desired ability to further reduce the model size. The proposed tunable-kernel model has advantages, in terms of model generalization capability and model sparsity, over the standard fixed-kernel model that restricts kernel centers to the training data points and employs a single common kernel variance for every kernel. On the other hand, it does not optimize all the model parameters together and thus avoids the problems of high-dimensional ill-conditioned nonlinear optimization associated with the conventional finite mixture model. Several examples are included to demonstrate the ability of the proposed novel tunable-kernel model to effectively construct a very compact density estimate accurately

Central Archive at the University of Reading

Southampton (e-Prints Soton)

Crossref

Kullback-Leibler aggregation and misspecified generalized linear models

Author: Rigollet Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/04/2012
Field of study

In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Selective machine learning of doubly robust functionals

Author: Cui Yifan
Tchetgen Eric Tchetgen
Publication venue
Publication date: 12/04/2021
Field of study

While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce two new selection criteria for bias reduction in estimating the functional of interest, each based on a novel definition of pseudo-risk for the functional that embodies the double robustness property and thus is used to select the pair of learners that is nearest to fulfilling this property. We establish an oracle property for a multi-fold cross-validation version of the new selection criteria which states that our empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each pair of candidate learners. We also describe a smooth approximation to the selection criteria which allows for valid post-selection inference. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study

arXiv.org e-Print Archive

Boosted Beta regression.

Author: Fenske Nora
Maloney Kelly O.
Mayr Andreas
Mitchell Richard
Schmid Matthias
Wickler Florian
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2013
Field of study

Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures

CiteSeerX

Directory of Open Access Journals

Open Access LMU

PubMed Central

FigShare

Fast learning rates for plug-in classifiers

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than

n^{-1/2}

. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Hal-Diderot