70,967 research outputs found
Improved AURA k-Nearest Neighbour approach
The k-Nearest Neighbour (kNN) approach is a widely-used technique for pattern classification. Ranked distance measurements to a known sample set determine the classification of unknown samples. Though effective, kNN, like most classification methods does not scale well with increased sample size. This is due to their being a relationship between the unknown query and every other sample in the data space. In order to make this operation scalable, we apply AURA to the kNN problem. AURA is a highly-scalable associative-memory based binary neural-network intended for high-speed approximate search and match operations on large unstructured datasets. Previous work has seen AURA methods applied to this problem as a scalable, but approximate kNN classifier. This paper continues this work by using AURA in conjunction with kernel-based input vectors, in order to create a fast scalable kNN classifier, whilst improving recall accuracy to levels similar to standard kNN implementations
Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data
The k-Nearest Neighbor (kNN) classification approach is conceptually simple -
yet widely applied since it often performs well in practical applications.
However, using a global constant k does not always provide an optimal solution,
e.g., for datasets with an irregular density distribution of data points. This
paper proposes an adaptive kNN classifier where k is chosen dynamically for
each instance (point) to be classified, such that the expected accuracy of
classification is maximized. We define the expected accuracy as the accuracy of
a set of structurally similar observations. An arbitrary similarity function
can be used to find these observations. We introduce and evaluate different
similarity functions. For the evaluation, we use five different classification
tasks based on geo-spatial data. Each classification task consists of (tens of)
thousands of items. We demonstrate, that the presented expected accuracy
measures can be a good estimator for kNN performance, and the proposed adaptive
kNN classifier outperforms common kNN and previously introduced adaptive kNN
algorithms. Also, we show that the range of considered k can be significantly
reduced to speed up the algorithm without negative influence on classification
accuracy
Estimating Photometric Redshifts of Quasars via K-nearest Neighbor Approach Based on Large Survey Databases
We apply one of lazy learning methods named k-nearest neighbor algorithm
(kNN) to estimate the photometric redshifts of quasars, based on various
datasets from the Sloan Digital Sky Survey (SDSS), UKIRT Infrared Deep Sky
Survey (UKIDSS) and Wide-field Infrared Survey Explorer (WISE) (the SDSS
sample, the SDSS-UKIDSS sample, the SDSS-WISE sample and the SDSS-UKIDSS-WISE
sample). The influence of the k value and different input patterns on the
performance of kNN is discussed. kNN arrives at the best performance when k is
different with a special input pattern for a special dataset. The best result
belongs to the SDSS-UKIDSS-WISE sample. The experimental results show that
generally the more information from more bands, the better performance of
photometric redshift estimation with kNN. The results also demonstrate that kNN
using multiband data can effectively solve the catastrophic failure of
photometric redshift estimation, which is met by many machine learning methods.
By comparing the performance of various methods for photometric redshift
estimation of quasars, kNN based on KD-Tree shows its superiority with the best
accuracy for our case.Comment: 28 pages, 4 figures, 3 tables, accepted for publication in A
Efficient Processing of k Nearest Neighbor Joins using MapReduce
k nearest neighbor join (kNN join), designed to find k nearest neighbors from
a dataset S for every object in another dataset R, is a primitive operation
widely adopted by many data mining applications. As a combination of the k
nearest neighbor query and the join operation, kNN join is an expensive
operation. Given the increasing volume of data, it is difficult to perform a
kNN join on a centralized machine efficiently. In this paper, we investigate
how to perform kNN join using MapReduce which is a well-accepted framework for
data-intensive applications over clusters of computers. In brief, the mappers
cluster objects into groups; the reducers perform the kNN join on each group of
objects separately. We design an effective mapping mechanism that exploits
pruning rules for distance filtering, and hence reduces both the shuffling and
computational costs. To reduce the shuffling cost, we propose two approximate
algorithms to minimize the number of replicas. Extensive experiments on our
in-house cluster demonstrate that our proposed methods are efficient, robust
and scalable.Comment: VLDB201
- …