Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Elisabeth Mischler (1825642); Florian Quentin (1825639); Julia Becker (1726858); Lena Butt (1825636); Martin Hiersemann (1763869); Valeska von Kiedrowski (1825645)

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Authors: Elisabeth Mischler (1825642)
Florian Quentin (1825639)
Julia Becker (1726858)
Lena Butt (1825636)
Martin Hiersemann (1763869)
Valeska von Kiedrowski (1825645)
Publication date: 15 August 2015
Publisher: HAL CCSD
Doi

Abstract

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the

k

-nearest neighbors (

k

NN) rule in the context of binary classification. Here we focus on the leave-

p

-out cross-validation (L

p

O) used to assess the performance of the

k

NN classifier. Remarkably this L

p

O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L

p

O estimator applied to the

k

NN classifier. Such results are obtained first by exploiting the connection between the L

p

O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L

1

O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L

p

O estimator and the classification error/risk of the

k

NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

FigShare

oai:figshare.com:article/23462...

Last time updated on 12/02/2018

INRIA a CCSD electronic archive server

oai:HAL:hal-01185092v2

Last time updated on 21/11/2017