1,083 research outputs found

    An adaptive nearest neighbor rule for classification

    Full text link
    We introduce a variant of the kk-nearest neighbor classifier in which kk is chosen adaptively for each query, rather than supplied as a parameter. The choice of kk depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger kk for predicting the labels of points in noisy regions.) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, kk-NN with an optimal choice of kk. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the `advantage' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest

    Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions

    Get PDF
    Given an nn-sample of random vectors (Xi,Yi)1≀i≀n(X_i,Y_i)_{1 \leq i \leq n} whose joint law is unknown, the long-standing problem of supervised classification aims to \textit{optimally} predict the label YY of a given a new observation XX. In this context, the nearest neighbor rule is a popular flexible and intuitive method in non-parametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is Rd\mathbb{R}^d. This paper is devoted to the study of the statistical properties of the nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of XX, as well as the smoothness and margin properties of the \textit{regression function} η(X)=E[Y∣X]\eta(X) = \mathbb{E}[Y | X]. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and to derive sharp estimates in the case of the nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure

    Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

    Full text link
    The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the kk-nearest neighbors (kkNN) rule in the context of binary classification. Here we focus on the leave-pp-out cross-validation (LppO) used to assess the performance of the kkNN classifier. Remarkably this LppO estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the LppO estimator applied to the kkNN classifier. Such results are obtained first by exploiting the connection between the LppO estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L11O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the LppO estimator and the classification error/risk of the kkNN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

    Nonparametric Estimation of the Bayes Error

    Get PDF
    This thesis is concerned with the performance of nonparametric classifiers and their application to the estimation of the Rayes error. Although the behavior of these classifiers as the number of preclassified design samples becomes infinite is well understood, very little is known regarding their finite sample error performance. Here, we examine the performance of Parzen and k-nearest neighbor (k-NN) classifiers, relating the expected error rates to the size of the design set and the various, design parameters (kernel size and shape, value of k, distance metric for nearest neighbor calculation, etc.). These results lead to several significant improvements in the design procedures for nonparametric classifiers, as well as improved estimates of the Bayes error rate. , Our results show that increasing the sample size is in many cases not an effective practical means of improving the classifier performance. Rather, careful attention must be paid to the decision threshold, selection of the kernel size and shape (for Parzen classifiers), and selection of k and the distance metric (for k-NN classifiers). Guidelines are developed toward propper selection of each of these parameters. The use of nonparametric error rates for Bayes error estimation is also considered, and techniques are given which reduce or compensate for the biases of the nonparametric error rates. A bootstrap technique is also developed which allows the designer to estimate the standard deviation of a nonparametric estimate of the Bayes error

    Least Ambiguous Set-Valued Classifiers with Bounded Error Levels

    Full text link
    In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label. Set-valued classifiers output sets of plausible labels rather than a single label, thereby giving a more appropriate and informative treatment to the labeling of ambiguous instances. We introduce a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence (the probability that the true label is contained in the set) while minimizing the ambiguity (the expected size of the output). We first derive oracle classifiers assuming the true distribution to be known. We show that the oracle classifiers are obtained from level sets of the functions that define the conditional probability of each class. Then we develop estimators with good asymptotic and finite sample properties. The proposed estimators build on existing single-label classifiers. The optimal classifier can sometimes output the empty set, but we provide two solutions to fix this issue that are suitable for various practical needs.Comment: Final version to be published in the Journal of the American Statistical Association at https://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1395341?journalCode=uasa2

    Classification with the nearest neighbor rule in general finite dimensional spaces

    Get PDF
    Given an n-sample of random vectors (Xi,Yi)1=i=n whose joint law is unknown, the long-standing problem of supervised classification aims to optimally predict the label Y of a given new observation X. In this context, the k-nearest neighbor rule is a popular flexible and intuitive method in nonparametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is Rd . This paper is devoted to the study of the statistical properties of the k-nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of X, as well as the smoothness and margin properties of the regression function n(X) = E[Y |X]. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and derive sharp estimates in the case of the k-nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussio
