Search CORE

6,886 research outputs found

Distributed k-nearest neighbor queries in metric spaces

Author: A Vlachou
Ashwin R. Bharambe
B Bustos
C Doulkeridis
C Traina Jr
D Novak
E Chávez
F Falchi
G Coulouris
G Mühl
HT Shen
HV Jagadish
Ion Stoica
M Batko
M Batko
M Batko
M Batko
Norbert Beckmann
P Kalnis
SM Ghanem
Sylvia Ratnasamy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Crossref

Institutional Knowledge at Singapore Management University

VBN

Solving All-k-Nearest Neighbor Problem without an Index

Author: Chávez Edgar
Ludueña Verónica
Reyes Nora Susana
Publication venue
Publication date: 10/03/2020
Field of study

Among the similarity queries in metric spaces, there are one that obtains the k-nearest neighbors of all the elements in the database (All-k-NN). One way to solve it is the naïve one: comparing each object in the database with all the other ones and returning the k elements nearest to it (k-NN). Another way to do this is by preprocessing the database to build an index, and then searching on this index for the k-NN of each element of the dataset. Answering to the All-k-NN problem allows to build the k-Nearest Neighbor graph (kNNG). Given an object collection of a metric space, the Nearest Neighbor Graph (NNG) associates each node with its closest neighbor under the given metric. If we link each object to their k nearest neighbors, we obtain the k Nearest Neighbor Graph (kNNG).The kNNG can be considered an index for a database, which is quite efficient and can allow improvements. In this work, we propose a new technique to solve the All-k-NN problem which do not use any index to obtain the k-NN of each element. This approach solves the problem avoiding as many comparisons as possible, only comparing some database elements and taking advantage of the distance function properties. Its total cost is significantly lower than that of the naïve solution.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

Servicio de Difusión de la Creación Intelectual

Approximate reverse k-nearest neighbor queries in general metric spaces

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

Crossref

Approximate Nearest Neighbor Searching with Non-Euclidean and Weighted Distances

Author: Abdelkader Ahmed
Arya Sunil
da Fonseca Guilherme D.
Mount David M.
Publication venue
Publication date: 27/06/2023
Field of study

We present a new approach to approximate nearest-neighbor queries in fixed dimension under a variety of non-Euclidean distances. We are given a set

S

n

points in

\mathbb{R}^d

, an approximation parameter

\varepsilon > 0

, and a distance function that satisfies certain smoothness and growth-rate assumptions. The objective is to preprocess

S

into a data structure so that for any query point

q

\mathbb{R}^d

, it is possible to efficiently report any point of

S

whose distance from

q

is within a factor of

1+\varepsilon

of the actual closest point. Prior to this work, the most efficient data structures for approximate nearest-neighbor searching in spaces of constant dimensionality applied only to the Euclidean metric. This paper overcomes this limitation through a method called convexification. For admissible distance functions, the proposed data structures answer queries in logarithmic time using

O(n \log (1 / \varepsilon) / \varepsilon^{d/2})

space, nearly matching the best known bounds for the Euclidean metric. These results apply to both convex scaling distance functions (including the Mahalanobis distance and weighted Minkowski metrics) and Bregman divergences (including the Kullback-Leibler divergence and the Itakura-Saito distance)

arXiv.org e-Print Archive

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

Approximate Nearest Neighbor Search for Low Dimensional Queries

Author: Har-Peled Sariel
Kumar Nirman
Publication venue
Publication date: 01/01/2010
Field of study

We study the Approximate Nearest Neighbor problem for metric spaces where the query points are constrained to lie on a subspace of low doubling dimension, while the data is high-dimensional. We show that this problem can be solved efficiently despite the high dimensionality of the data.Comment: 25 page

arXiv.org e-Print Archive

University of Memphis Digital Commons

CiteSeerX

Crossref

Improving metric access methods with bucket files

Author: Kaster Daniel S.
Pola Ives R. V.
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Cham
Publication date: 01/01/2015
Field of study

Modern applications deal with complex data, where retrieval by similarity plays an important role in most of them. Complex data whose primary comparison mechanisms are similarity predicates are usually immersed in metric spaces. Metric Access Methods (MAMs) exploit the metric space properties to divide the metric space into regions and conquer efficiency on the processing of similarity queries, like range and k-nearest neighbor queries. \ud Existing MAM use homogeneous data structures to improve query execution, pursuing the same techniques employed by traditional methods developed to retrieve scalar and multidimensional data. In this paper, we combine hashing and hierarchical ball partitioning approaches to achieve a hybrid index that is tuned to improve similarity queries targeting complex data sets, with search algorithms that reduce total execution time by aggressively reducing the number of distance calculations. We applied our technique in the Slim-tree and performed experiments over real data sets showing that the proposed technique is able to reduce the execution time of both range and k-nearest queries to at least half of the Slim-tree. Moreover, this technique is general to be applied over many existing MAM.CAPESCNPqFAPESPInternational Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

Active Nearest-Neighbor Learning in Metric Spaces

Author: Kontorovich Aryeh
Sabato Sivan
Urner Ruth
Publication venue
Publication date: 01/06/2017
Field of study

We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. MARMANN is based on a generalized sample compression scheme, and a new label-efficient active model-selection procedure

arXiv.org e-Print Archive

MPG.PuRe