Search CORE

26,599 research outputs found

Partially Specified Nearest Neighbor Search

Author: C.M. Eastman
D.T. Lee
H. Brönnimann
J. Matoušek
J. Matoušek
L. Arge
P. Zimmermann
P.K. Agarwal
T. Bernecker
T. Hruz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Providing Diversity in K-Nearest Neighbor Query Results

Author: Haritsa Jayant R.
Jain Anoop
Sarda Parag
Publication venue
Publication date: 15/10/2003
Field of study

Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.Comment: 20 pages, 11 figure

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests

Author: Avrithis Yannis
Emiris Ioannis Z.
Samaras Georgios
Publication venue
Publication date: 01/03/2016
Field of study

We propose a new data-structure, the generalized randomized kd forest, or kgeraf, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus accelerating queries. We release public domain software geraf and we compare it to existing implementations of state-of-the-art methods including BBD-trees, Locality Sensitive Hashing, randomized kd forests, and product quantization. Experimental results indicate that our method would be the method of choice in dimensions around 1,000, and probably up to 10,000, and pointsets of cardinality up to a few hundred thousands or even one million; this range of inputs is encountered in many critical applications today. For instance, we handle a real dataset of

10^6

images represented in 960 dimensions with a query time of less than

1

sec on average and 90\% responses being true nearest neighbors

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Mapping crime: Understanding Hotspots

Author: Cameron J
Chainey S
Eck J
Wilson R
Publication venue: National Institute of Justice
Publication date: 01/08/2005
Field of study

UCL Discovery