43,865 research outputs found
Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data
The k-Nearest Neighbor (kNN) classification approach is conceptually simple -
yet widely applied since it often performs well in practical applications.
However, using a global constant k does not always provide an optimal solution,
e.g., for datasets with an irregular density distribution of data points. This
paper proposes an adaptive kNN classifier where k is chosen dynamically for
each instance (point) to be classified, such that the expected accuracy of
classification is maximized. We define the expected accuracy as the accuracy of
a set of structurally similar observations. An arbitrary similarity function
can be used to find these observations. We introduce and evaluate different
similarity functions. For the evaluation, we use five different classification
tasks based on geo-spatial data. Each classification task consists of (tens of)
thousands of items. We demonstrate, that the presented expected accuracy
measures can be a good estimator for kNN performance, and the proposed adaptive
kNN classifier outperforms common kNN and previously introduced adaptive kNN
algorithms. Also, we show that the range of considered k can be significantly
reduced to speed up the algorithm without negative influence on classification
accuracy
Stabilized Nearest Neighbor Classifier and Its Statistical Properties
The stability of statistical analysis is an important indicator for
reproducibility, which is one main principle of scientific method. It entails
that similar statistical conclusions can be reached based on independent
samples from the same underlying population. In this paper, we introduce a
general measure of classification instability (CIS) to quantify the sampling
variability of the prediction made by a classification method. Interestingly,
the asymptotic CIS of any weighted nearest neighbor classifier turns out to be
proportional to the Euclidean norm of its weight vector. Based on this concise
form, we propose a stabilized nearest neighbor (SNN) classifier, which
distinguishes itself from other nearest neighbor classifiers, by taking the
stability into consideration. In theory, we prove that SNN attains the minimax
optimal convergence rate in risk, and a sharp convergence rate in CIS. The
latter rate result is established for general plug-in classifiers under a
low-noise condition. Extensive simulated and real examples demonstrate that SNN
achieves a considerable improvement in CIS over existing nearest neighbor
classifiers, with comparable classification accuracy. We implement the
algorithm in a publicly available R package snn.Comment: 48 Pages, 11 Figures. To Appear in JASA--T&
Weighted k-Nearest-Neighbor Techniques and Ordinal Classification
In the field of statistical discrimination k-nearest neighbor classification is a well-known, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection to LOESS, a local regression technique. In addition we show possibilities to use nearest neighbor for classification in the case of an ordinal class structure. Empirical studies show the advantages of the new techniques
- …