Search CORE

39,279 research outputs found

CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

Author: Bai Mingyuan
Gao Junbin
Li Yang
Liu Gang
Ming Zi
Ye Lixin
Publication venue
Publication date: 18/05/2020
Field of study

The reverse

k

nearest neighbor (R

k

NN) query finds all points that have the query point as one of their

k

nearest neighbors (

k

NN), where the

k

NN query finds the

k

closest points to its query point. Based on the characteristics of conic section, we propose a discriminance, named CSD (Conic Section Discriminance), to determine points whether belong to the R

k

NN set without issuing any queries with non-constant computational complexity. By using CSD, we also implement an efficient R

k

NN algorithm CSD-R

k

NN with a computational complexity at

O(k^{1.5}\cdot log\,k)

. The comparative experiments are conducted between CSD-R

k

NN and other two state-of-the-art RkNN algorithms, SLICE and VR-R

k

NN. The experimental results indicate that the efficiency of CSD-R

k

NN is significantly higher than its competitors

arXiv.org e-Print Archive

Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners

Author: Li Frank
Paxson Vern
Shin Richard
Publication venue
Publication date: 29/07/2015
Field of study

The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems

arXiv.org e-Print Archive

CiteSeerX

Non-Asymptotic Uniform Rates of Consistency for k-NN Regression

Author: Jiang Heinrich
Publication venue
Publication date: 02/11/2018
Field of study

We derive high-probability finite-sample uniform rates of consistency for

k

-NN regression that are optimal up to logarithmic factors under mild assumptions. We moreover show that

k

-NN regression adapts to an unknown lower intrinsic dimension automatically. We then apply the

k

-NN regression rates to establish new results about estimating the level sets and global maxima of a function from noisy observations.Comment: In Proceedings of 33rd AAAI Conference on Artificial Intelligence (AAAI 2019

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

Author: Kasabov Nikola
Tu Enmei
Yang Jie
Zhang Yaqian
Zhu Lin
Publication venue
Publication date: 03/06/2016
Field of study

k

Nearest Neighbors (

k

NN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based

k

NN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an

R

-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new

k

NN algorithm and its improvements to other version of

k

NN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional

k

NN algorithm, the proposed manifold version

k

NN shows promising potential for classifying manifold-distributed data.Comment: 32 pages, 12 figures, 7 table

arXiv.org e-Print Archive

AUT Scholarly Commons

A binary neural k-nearest neighbour technique

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2005
Field of study

K-Nearest Neighbour (k-NN) is a widely used technique for classifying and clustering data. K-NN is effective but is often criticised for its polynomial run-time growth as k-NN calculates the distance to every other record in the data set for each record in turn. This paper evaluates a novel k-NN classifier with linear growth and faster run-time built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and real-valued data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall the k-best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations

White Rose Research Online