14,431 research outputs found
Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data
The k-Nearest Neighbor (kNN) classification approach is conceptually simple -
yet widely applied since it often performs well in practical applications.
However, using a global constant k does not always provide an optimal solution,
e.g., for datasets with an irregular density distribution of data points. This
paper proposes an adaptive kNN classifier where k is chosen dynamically for
each instance (point) to be classified, such that the expected accuracy of
classification is maximized. We define the expected accuracy as the accuracy of
a set of structurally similar observations. An arbitrary similarity function
can be used to find these observations. We introduce and evaluate different
similarity functions. For the evaluation, we use five different classification
tasks based on geo-spatial data. Each classification task consists of (tens of)
thousands of items. We demonstrate, that the presented expected accuracy
measures can be a good estimator for kNN performance, and the proposed adaptive
kNN classifier outperforms common kNN and previously introduced adaptive kNN
algorithms. Also, we show that the range of considered k can be significantly
reduced to speed up the algorithm without negative influence on classification
accuracy
Maximum Margin Multiclass Nearest Neighbors
We develop a general framework for margin-based multicategory classification
in metric spaces. The basic work-horse is a margin-regularized version of the
nearest-neighbor classifier. We prove generalization bounds that match the
state of the art in sample size and significantly improve the dependence on
the number of classes . Our point of departure is a nearly Bayes-optimal
finite-sample risk bound independent of . Although -free, this bound is
unregularized and non-adaptive, which motivates our main result: Rademacher and
scale-sensitive margin bounds with a logarithmic dependence on . As the best
previous risk estimates in this setting were of order , our bound is
exponentially sharper. From the algorithmic standpoint, in doubling metric
spaces our classifier may be trained on examples in time and
evaluated on new points in time
Adaptive imputation of missing values for incomplete pattern classification
In classification of incomplete pattern, the missing values can either play a
crucial role in the class determination, or have only little influence (or
eventually none) on the classification results according to the context. We
propose a credal classification method for incomplete pattern with adaptive
imputation of missing values based on belief function theory. At first, we try
to classify the object (incomplete pattern) based only on the available
attribute values. As underlying principle, we assume that the missing
information is not crucial for the classification if a specific class for the
object can be found using only the available information. In this case, the
object is committed to this particular class. However, if the object cannot be
classified without ambiguity, it means that the missing values play a main role
for achieving an accurate classification. In this case, the missing values will
be imputed based on the K-nearest neighbor (K-NN) and self-organizing map (SOM)
techniques, and the edited pattern with the imputation is then classified. The
(original or edited) pattern is respectively classified according to each
training class, and the classification results represented by basic belief
assignments are fused with proper combination rules for making the credal
classification. The object is allowed to belong with different masses of belief
to the specific classes and meta-classes (which are particular disjunctions of
several single classes). The credal classification captures well the
uncertainty and imprecision of classification, and reduces effectively the rate
of misclassifications thanks to the introduction of meta-classes. The
effectiveness of the proposed method with respect to other classical methods is
demonstrated based on several experiments using artificial and real data sets
- âŠ